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PREFACE 


The  behavioral  research  conducted  under  the  Naval  Training 
Equipment  Center's  Visual  Technology  Research  Simulator  (VTRS) 
program  consists  of  experiments  to  provide  the  basis  for  design 
criteria  for  flight  trainers.  Because  it  is  necessary  to  in¬ 
vestigate  the  effects  of  a  great  many  simulator  features,  much 
attention  has  been  given  to  the  use  of  experimental  methods 
capable  of  handling  complex  multifactor  problems.  The  author 
of  this  report.  Dr.  Charles  W.  Simon,  has  devoted  the  past 
decade  to  the  study  of  means  to  improve  the  quality  and  useful¬ 
ness  of  behavioral  research  through  the  use  of  methods  that  are, 
in  many  respects,  quite  different  from  those  typically  used  by 
applied  behavioral  scientists.  The  critical  difference  is  that 
with  the  "new  paradigm"  (a  term  used  by  Simon  to  refer  to  the 
philosophy,  strategy  and  techniques  he  discusses),  variables  are 
examined  with  a  gradually  increasing  precision  as  more  is  learned 
about  their,  effects.  The  advantage  is  that  the  experimenter  is 
less  constrained  to  investigate  only  a  few  variables  at  a  time. 

He  is  not  forced  to  hold  constant  (or  allow  to  vary  in  some  un¬ 
known  way)  other  factors  that  may  interact  in  important  ways  with 
those  under  study. 

The  experimenter  initially  looks  at  many  things  with  the 
intent  of  screening  out  those  which  are  trivial  for  a  particular 
task.  The  non- trivial  factors  are  then  investigated  further 
until  ultimately  a  sufficiently  precise  equation  is  generated 
and  verified.  With  this  approach  it  would  often  be  possible, 
with  no  increase  in  the  amount  of  data  collected,  to  obtain  the 
same  amount  of  useful  information  on  perhaps  50  variables  and 
their  interrelationships  as  would  ordinarily  be  obtained  on  five 
variables  using  traditional  methods.  Predictions  from  the  labor¬ 
atory  to  the  field  can  then  be  made  with  greater  confidence,  and 
a  quantitative  data  base  is  established  which  can  be  augmented 
easily. 

» *" 

A  report  recently  published  as  NAVTRAEQUIPCEN  77-C-0065-1 
(Simon  1979)  summarized  the  ways  in  which  the  methods  he  advo¬ 
cates  should  be  applied  to  the  VTRS  program.  The  present  report 
supplements  that  document  by  providing  additional  information  to 
aid  in  the  design  and  interpretation  of  multifactor  experiments. 
The  information  presented  here  was  required  in  preparation  for  a 
screening  experiment  recently  completed  on  aircraft  carrier  land- 
,•  ing  performance.  A  description  of  this  research  will  appear  as 
^NAVTRAEQUIPCEN  78-C-0060-7  .  The  practical  implementation  of  the 
techniques  will  be  discussed  in  that  report. 
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There  are  many  fields  of  experimental  psychology  for  which 
the  approach  advocated  by  Simon  can  result  in  more  useful  infor¬ 
mation  obtained  at  a  lower  cost.  This  report  will  therefore  be 
applicable  to  a  wide  range  of  research  topics  besides  flight 
training.  Because  it  assumes  that  the  reader  is  familiar  with 
Simon’s  earlier  work,  it  should  be  regarded  as  a  "companion  piece" 
to  NAVTRAEQUIPCEN  77-C-0065-1,  which  provides  much  of  the  back¬ 


ground  information  a  new  reader  would  require. 


STANLEY  C.  COLLYER 
Scientific  Officer 
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SECTION  I 
INTRODUCTION 

The  Naval  Training  Equipment  Center  has  built  a  Visual 
Technology  Research  Simulator  (VTRS,  formerly  referred  to  as 
AWAVS)  composed  of  a  cockpit,  a  wide-angle  visual  system,  and 
a  six-degrees-of-motion  system.  These  combine  into  a  versa¬ 
tile  device  for  studying  the  effects  of  equipment  parameters 
in  the  context  of  pilot  training.  The  large  number  of 
parameters  that  must  be  investigated  requires  the  use  of 
experimental  methods  that  permit  studying  many  factors 
economically .  A  discussion  of  the  philosophy,  strategy,  and 
techniques  being  employed  in  much  of  the  research  conducted 
on  this  program  has  been  provided  elsewhere  (Simon  1979). 

This  report  is  made  up  of  a  series  of  individual  papers 
on  different  techniques  needed  to  enhance  the  methodologies 
that  are  to  be  used  in  the  VTRS  human  performance  experiments. 

In  the  series  of  reports  by  Simon  (1970  -  1979)  on  a  new 
paradigm  for  psychological  research,  the  holistic  approach  to 
systematic  experimentation  is  proposed  and  the  strategies  and 
techniques  for  accomplishing  this  are  described.  While  the 
basic  tools  required  to  employ  the  "new  paradigm"  (Simon,  1977b) 
in  the  VTRS  program  are  available,  there  are  still  techniques 
that  need  to  be  understood  in  detail  to  supplement  the  use  of 
those  described  in  the  original  documents. 

As  a  part  of  this  year's  effort,  supplemental  procedures 
for  the  design,  analysis,  and  interpretation  of  economical 
multifactor  experiments  were  sought.  The  relevant  ones  are 
described  here.  None  is  original  with  this  investigator. 

They  have  been  included  here  to  reduce  the  time  required  to 
search  them  out,  to  read  and  collate  related  source  material , 
and  to  relate  them  to  the  "new  paradigm."  After  using  this 
report  to  obtain  a  basic  understanding  of  these  techniques, 
the  reader  is  encouraged  to  read  the  original  material. 

The  following  techniques  are  discussed: 

a.  WHAT  TO  DO  WHEN  THE  MODEL  FOR  THE  EXPERIMENTAL 
DESIGN  INADEQUATELY  REPRESENTS  THE  EMPIRICAL  DATA 

i  THE  PROBLEM 

ii  LACK  OF  FIT  TEST 

iii  TRANSFORMATION 
iv  AUGMENTATION 

b.  USING  YATES'  ALGORITHM  WITH  SCREENING  DESIGNS 

C.  ANALYZING  RESIDUALS 
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d.  IDENTIFYING  THE  EXPERIMENTAL  CONDITIONS  IN  2k_P 
DESIGNS  WHEN  GIVEN  THE  DEFINING  GENERATORS 

e.  AN  ECONOMICAL  DESIGN  FOR  SCREENING  INTERACTION 
EFFECTS 

f.  GRAPHIC  METHOD  AND  INTERNAL  COMPARISONS  FOR 
MULTIPLE  RESPONSE  DATA 

g.  THE  PLACE  FOR  REPLICATION  IN  ECONOMICAL 
MULTIFACTOR  RESEARCH 

h.  THE  SIGNIFICANCE  OF  TESTS  OF  STATISTICAL 
SIGNIFICANCE 

i.  DETERMINING  THE  PROBABILITY  OF  ACCEPTING  THE 
NULL  HYPOTHESIS  WHEN  IN  FACT  IT  IS  FALSE 

j .  TESTING  NON-ADDITIVITY  IN  EXPERIMENTAL  DATA 
FROM  A  LATIN  SQUARE  DESIGN 

k.  HOW  TO  INCLUDE  FACTORS  WITH  MORE  THAN  TWO  LEVELS 
IN  A  SCREENING  DESIGN 

l.  ANALYZING  EXTRA-PERIOD  CHANGE-OVER  DESIGNS 

m.  ANALYZING  SERIALLY-BALANCED  SEQUENCE  DESIGNS 

n.  DESIGN  ECONOMY  WHEN  EXPERIMENTAL  FACTORS 
SELECTIVELY  AFFECT  BI -VARIATE  CRITERIA 


RELATING  THE  CONTENT  OF  THIS  REPORT  TO  PREVIOUS  REPORTS 

The  new  paradigm  for  research  on  equipment  design  developed 
by  Simon  (1970-1979)  emphasizes  the  importance  of  a  multifactor 
approach  involving  "all"  critical  parameters  of  a  particular 
task  and  provides  the  practical  and  economical  strategies  and 
techniques  for  accomplishing  this.  Much  of  the  material 
written  about  the  basic  approach  has  been  presented  as  if  it 
were  a  constant,  unvarying  process.  In  practice,  however,  the 
comprehensiveness  of  the  approach  and  the  vagaries  of  humans 
performing  complex  tasks  in  less  than  optimum  environments 
makes  any  "cookbook"  approach  inadequate.  The  investigators 
must  be  prepared  to  handle  variations  upon  the  basic  approach 
and  to  deal  with  the  complexities  of  the  problem  as  it  exists 
in  the  real  world.  If  they  cannot,  the  holistic  approach  for 
behavioral  research  will  not  be  successful.  The  material 
in  this  report  is  intended  to  supplement  the  material  already 
written  in  order  to  better  prepare  the  investigator  for  con¬ 
ducting  experiments  on  VTRS  and  similar  programs.  The  dis¬ 
cussion  below  relates  the  sections  in  this  report  to  previous 
materials  dealing  with  problems  of  design,  analysis,  and 
interpretation . 
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Design 

The  term  "design,"  will  be  used  here  in  the  limited 
sense,  to  refer  to  the  selection  of  coordinates  in  a  multi¬ 
factor  space  where  performance  data  is  to  be  collected.  In 
the  basic  approach  to  economical  multifactor  experiments,  data 
is  collected  in  blocks  beginning  with  the  points  which  form 
designs  of  lower  order  so  that  a  sequential  build-up  of  designs 
of  increasing  order  will  be  possible.  By  testing  after  each 
block  of  data  is  collected,  the  investigator  can  develop  a 
polynomial  of  appropriate  order  to  be  fitted  "as  close  as 
possible"  to  the  true  unknown  response  function  while  using  as 
few  experimental  runs  as  is  consistent  with  o£her  objectives. 
The  basic  design  used  in  this  approach  is  a  2K_P  fractional 
factorial,  which  is  modified  to  achieve  the  experimental  goal. 
Items  a-iv,  d,  e,  g,  k,  and  n  are  concerned  with  some  of  the 
most  frequently  encountered  modifications. 

After  the  design  has  been  expanded  to  approximate  a  second 
order  model,  it  is  uneconomical  to  collect  —  if  needed  —  all 
of  the  points  required  for  a  complete  design  of  higher  order. 
Instead,  the  investigator  must  know  how  to  augment  the  data 
collection  space  with  points  that  will  isolate  specifically 
chosen  sources  of  variance  (Item  a-iv) .  Sources  that  supply 
unusual  experimental  data  collection  plans  may  provide  only 
the  "defining  generators,"  that  is,  a  succinct  coded  descrip¬ 
tion  of  the  design.  An  investigator  must  know  how  to  determine 
which  experimental  conditions  (i.e.,  the  coordinates  of  the 
experimental  space)  make  up  this  design  (Item  d) .  Ordinarily, 
investigators  are  interested  in  identifying  which  factors  are 
the  most  important  and  most  designs  —  particularly  the 
economical  multifactor  designs  —  are  constructed  to  reflect 
this  interest.  When  interactions  are  expected  to  be  predomi¬ 
nant  and  a  large  number  of  factors  are  being  investigated,  the 
investigator  who  is  familiar  with  an  economical  plan  for 
screening  interaction  effects  can  save  much  time  and  effort 
(Item  e) .  Most  psychologists  replicate  basic  experimental 
designs  almost  automatically,  whether  or  not  it  serves  a  useful 
purpose  and  without  regard  for  the  extra  data  collection  costs 
involved.  When  large  multifactor  experiments  are  to  be 
performed  with  reasonable  economy,  the  investigator  must 
understand  when  replicating  is  and  is  not  necessary  and  what 
more  economical  alternatives  to  complete  replication  are 
available  (Item  g) .  While  the  two-level  design  is  usually 
suitable  for  most  screening  studies  at  the  beginning  of  a 
large  multifactor  investigation,  there  are  times  when  the 
investigator  may  wish  to  examine  one  or  two  factors  at  three 
(or  even  four)  levels  when  the  first  block  or  two  of  data  is 
being  collected.  He  must  be  aware  of  when  this  is  reasonable, 
what  alternatives  are  open  to  him,  and  if  he  decides  to  go 
ahead,  how  to  fit  the  three-level  factor  economically  into 
the  two- level  basic  design  (Item  k) .  For  truly  holistic  ex¬ 
perimentation,  multifactor  experiments  will  commonly  require 
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multivariate  criteria.  The  investigator  should  be  aware  of 
those  special  circumstances  when  the  experimental  data 
collection  can  be  made  more  economical  (Item  n) . 


Analysis 


Too  often,  the  analysis  of  behavioral  research  data  has 
been  limited  to  a  routine,  computerized  treatment.  This  is 
not  sufficient  when  the  data  is  as  rich  in  information  as  that 
which  comes  from  a  multifactor  design.  Furthermore,  when 
"economical"  designs  are  used,  it  is  necessary  not  only  to 
analyze  for  content,  but  also  for  data  quality  in  order  to 
avoid  potential  misinterpretations.  Items  a-iii,  b,  c,  and 
f  deal  with  topics  related  to  the  analysis  of*  such  data. 


In  some  cases,  proper  data  analysis  may  be  substituted 
for  the  additional  data  collection  required  by  a  more  complex 
design.  Before  one  adds  a  new  block  of  data  to  isolate 
higher  order  interaction  effects  as  prescribed  by  sequential 
design  strategy  described  earlier,  it  is  more  economical  to 
examine  transformations  of  the  original  performance  data  to 
see  if  a  simpler  model  can  be  effected  without  requiring  more 
data  to  be  collected.  With  multifactor  designs,  the  investi¬ 
gator  must  be  knowledgeable  of  special  techniques  necessary 
to  optimize  transformations  across  all  variables  (Item  a-iii). 
When  2k  p  designs  are  used  as  blocks  in  the  sequential 
strategy,  using  Yates*  algorithm  for  data  analysis  can  prove 
to  be  more  effective  than  computerized  regression  routines 
when  the  number  of  factors  are  more  than  the  computer  system 
can  handle.  With  complete  2*  factorials,  the  Yates'  algorithm 
is  easy  to  use;  with  fractional  2K  p  designs,  including  the 
special  case  of  the  robust  screening  design,  some  translation 
of  the  results  is  required.  The  investigator  must  be  able  to 
make  this  translation  in  order  to  interpret  his  data  (Item  b) . 
Analyzing  the  residuals  between  obtained  and  estimated  per¬ 
formance  scores  provides  valuable  information  for  the 
investigator  who  wishes  to  understand  the  quality  of  his  data, 
to  decide  what  the  next  step  in  the  experiment  should  be,  and 
to  interpret  the  existing  data.  Most  psychologists  are  not 
aware  of  the  usefulness  of  this  type  of  analysis  and  should 
be  (Item  c) .  Holistic  research  is  expected  to  deal  with  a 
complex  world  —  multiple  independent  variables  and  multiple 
criteria.  Graphic  methods,  so  helpful  in  understanding  data 
from  single-criteria  experiments,  can  also  be  useful  when 
multiple  criteria  are  employed.  This  latter  analysis  however, 
is  more  complicated  to  perform  and  to  interpret;  the  investi¬ 
gator  needs  simplified  explanations  of  both  (Item  f ) . 


Psychologists  have  employed  Latin  square  designs  for 
many  years  to  isolate  the  effects  of  treatments,  subjects, 
and  trials  when  the  same  subject  is  tested  across  all 
treatments.  At  the  same  time,  almost  none  have  evaluated 
their  data  to  see  if  it  meets  the  assumptions  required  to 
use  the  design  (Item  j),  nor  have  used  the  analysis  to  make 
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full  use  of  this  data  collection  plan  to  isolate  effects  that 
might  be  carried  over  from  one  treatment  to  the  next,  a 
linear  transfer  effect  (Items  1  and  m) .  If  one  can  justify 
examining  only  linear  transfer  effects,  this  class  of  design 
would  be  an  economical  means  of  studying  transfer  of  training 
using  a  small  group  of  subjects  or  only  a  single  individual. 
Information  on  how  to  analyze  these  two  types  of  designs  is 
not  readily  available  nor  always  clear,  and  so  it  is  provided 
here  (Items  1  and  m  respectively) . 


Interpretation 


In  addition  to  design  and  analysis,  the  third  leg  of 
any  research  effort  is  the  interpretation  of  the  data.  This 
differs  from  the  analysis  since  the  former  merely  organizes 
and  summarizes  the  data  while  the  latter  considers  the 
practical  implications  of  the  results.  Some  aids  to  inter¬ 
pretation  can  be  found  in  the  sections  on  design  and  analysis. 

For  the  paradigm  for  large-scale  multifactor  research, 
tests  of  lack  of  fit  are  one  means  of  deciding  whether  or  not 
enough  data  has  been  collected  to  write  an  equation  that 
adequately  approximates  the  experimental  space.  This  lack-of- 
fit  test  should  be  made  after  each  block  of  data  of  higher 
order  has  been  collected  (Item  a-ii) .  Because  of  the  fanatic 
and  sometimes  frenetic  reliance  that  psychologists  place  on 
"tests  of  statistical  significance"  in  the  interpretation  and 
evaluation  of  their  experimental  data,  it  is  important  that 
the  ways  this  test  has  been  misused  and  misinterpreted  be 
understood  by  the  practicing  experimenter.  A  large  number  of 
papers  have  appeared  in  the  behavioral  science  literature 
spotlighting  these  deficiencies,  without,  it  seems,  having 
done  much  to  reduce  them.  A  summary  of  the  facts  and  fallacies 
that  surround  this  procedure  (Item  h)  should  sensitize  the 
psychologist  who  uses  this  test  to  its  limitations  as  well  as 
to  the  way  it  has  frequently  been  misused. 

The  techniques  described  in  this  report  are  of  limited 
value  in  isolation.  On  the  other  hand,  they  are  important 
addenda  to  the  material  already  discussed  by  Simon  (1970-1979) 
elsewhere  and  have  specific  applications  for  a  properly 
conducted  VTRS  program  as  well  as  similar  research  projects 
in  the  future. 
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SECTION  II 

WHAT  TO  DO  WHEN  THE  MODEL  FOR  THE  EXPERIMENTAL 
DESIGN  INADEQUATELY  REPRESENTS  THE  EMPIRICAL  DATA 

THE  PROBLEM 

One  desirable  feature  of  a  "good"  experimental  design 
for  mapping  the  response  surface  of  an  experimental  region 
is  that  it  includes  points  which  form  designs  of  lower  order 
so  that  a  sequential  build-up  of  designs  of  lower  order  is 
possible  (Box,  1964) .  This  feature  greatly  enhances  the 
economy  of  data  collection  since  the  information  collected 
early  in  the  sequence  can  be  used  to  identify  critical 
factors,  permitting  factors  contributing  only  trivial 
effects  to  be  dropped  before  the  response-fitting  phase 
begins.  In  general,  it  will  be  true  that  relatively  few 
factors  will  be  needed  to  account  for  most  of  the  perfor¬ 
mance  variability  in  a  specific  task.  Quite  frequently,  a 
Resolution  IV  design  will  provide  nearly  all  of  the  infor¬ 
mation  required  for  factor  identification  and  a  second-order 
design  will  adequately  describe  most  response  surfaces  of 
human  performance.  But  what  happens  when  either  of  these 
statements  are  not  true?  What  procedures  must  an  investi¬ 
gator  employ  then? 

In  the  discussion  that  follows,  alternative  actions 
available  to  an  investigator  when  these  situations  occur  are 
described.  Since  the  solutions  for  following  up  on  screening 
designs  and  for  expanding  the  response  surface  involve  the 
same  or  similar  considerations,  these  two  problem  areas  are 
treated  together  here. 

Alternative  Actions 


Had  the  investigator  anticipated  the  possibility  that 
higher-order  effects  might  be  present,  he  might  have  started 
with  a  particular  experimental  design  capable  of  expanding 
to  the  desired  order.  Thus  there  are  Resolution  III,  IV,  and 
V  fractional  factorials  capable  of  being  built  from  lower 
order  designs.  Similarly  there  are  second  and  third-order 
response  surface  designs  that  can  be  built  sequentially  from 
lower  order  designs.  As  a  general  principle,  however. 
Resolution  V  fractional  factorials  and  sequential  third- 
order  response  surface  designs  tend  to  be  uneconomical, 
requiring  more  data  collection  than  is  probably  necessary. 

For  this  reason,  it  is  less  likely  that  the  investigator  — 
even  if  he  anticipates  the  need  --  will  prefer  to  employ 
this  alternative. 

When  a  test  of  lack  of  fit  reveals  the  presence  of 
higher-order  effects  not  yet  included  in  the  polynomial,  the 
investigator  should  first  inspect  his  data  for  deviant 
values  from  irrelevant  sources.  Unusual  values  not 
associated  with  the  true  intent  of  the  experiment  may  occur 
if  a  subject  fails  to  respond  in  accordance  with  instructions. 
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if  equipment  fails  or  otherwise  "misbehaves,"  and  if  data  is 
recorded  (or  analyzed)  incorrectly.  These  and  similar  dis¬ 
turbances  in  the  data  can  create  results  that  mathematically 
appear  as  higher-order  interaction  effects.  Any  such  out¬ 
liers  therefore  must  be  detected,  not  only  to  reduce  the 
chances  of  drawing  erroneous  conclusions  but  also  to  prevent 
the  investigator  from  being  misled  into  collecting  more  data 
unnecessarily  to  isolate  these  quasi-interactions. 

If  there  is  no  reason  to  suspect  the  accuracy  of  the 
data,  the  investigator  may  consider  transforming  the  data  to 
reduce  or  eliminate  higher-order  effects  before  considering 
the  less  economical  approach  of  collecting  more  data. 
Psychologists  have  more  frequently  employed  data  transfor¬ 
mations  to  meet  the  assumptions  of  statistical  analysis  than 
they  have  to  simplify  the  regression  model.  If  a  simpler 
model  can  be  effected  with  transformations  so  that  higher- 
order  effects  are  for  all  practical  purposes  eliminated,  then 
the  amount  of  data  needed  to  approximate  the  correct  model 
is  reduced.  The  difficulty  in  applying  this  approach  is  in 
selecting  the  proper  transformation  or  transformations.  Then, 
too,  some  types  of  non-additivity  can  never  be  eliminated 
with  transformations.* 

When  transformations  fail  to  simplify  the  data,  the 
investigator  has  no  other  choice  but  to  collect  additional 
data  to  isolate  the  effects  that  account  for  the  lack  of  fit. 
Exactly  what  is  best  to  do  in  this  case  is  not  always  obvious 
when  economy  must  be  a  primary  consideration.  It  takes  a 
considerable  amount  of  additional  data  to  change  a 
Resolution  IV  screening  design  to  Resolution  V.  No  standard 
methods  are  available  to  expand  a  second-order  central- 
composite  design  when  the  need  for  a  third-order  model  is 
indicated.  However,  there  are  procedures  for  selecting  a 
limited  number  of  data  points  that  will  isolate  only  the 
effects  of  greater  interest. 

Major  topics  to  be  discussed  in  the  sections  that 
follow  include: 

•  Lack  of  fit  tests 

•  Multivariate  transformations 

•  Augmentation  techniques 


*  An  ounce  of  prevention  is  worth  a  pound  of  cure. 
Selecting  the  scaling  should  be  a  major  effort  of  the  pre- 
experimental  planning  phase. 
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TESTING  THE  FIT  OF  THE  MODEL 

In  the  2k~p  screening  design,  lack  of  fit  refers  to  the 
need  to  include  interactions  in  what  otherwise  would  be  a 
first  degree  polynomial.  Their  presence  may  be  inferred  when 
a  string  of  interactions  in  the  Resolution  IV  designs  shows 
a  non-trivial  effect.*  There  is  no  need  to  test  for  curva¬ 
ture  during  the  screening  phase  since  the  factors  are  only 
at  two  levels.  Since  these  designs  are  not  replicated  there 
are  no  degrees  of  freedom  for  extimating  error.  However,  a 
test  of  the  statistical  significance  of  potentially  inter¬ 
esting  effects  can  be  approximated  using  order  normalized 
plots  and  estimated  error  variances  (Daniel,  1976;  Simon, 
1977a) . 

With  quantitative  factors,  the  goal  ultimately  will  be 
to  estimate  a  response  surface.  Lack  of  fit  tests  are  needed 
to  see  if  the  completed  screening  data,  a  first  degree 
polynomial  plus  critical  interaction  terms,  adequately  fits 
a  linear  surface  or  whether  the  curvature  of  the  surface 
must  also  be  accounted  for.  To  determine  whether  quadratic 
terms  are  needed  to  approximate  the  response  surface,  it  is 
necessary  to  add  center  points  to  the  screening  design.  If 
it  can  be  anticipated  that  this  step  will  eventually  be 
taken,  it  is  better  to  do  so  when  the  screening  data  is 
being  collected,  rather  than  later  (Simon,  1977a) . 

To  determine  whether  or  not  quadratic  terms  are  needed 
to  fit  the  response  surface,  the  investigator  would  compare 
the  average  performance  of  all  points  in  the  hypercube 
against  the  average  performance  at  the  center,  i.e., 


This  measure  of  overall  curvature  is  equal  to  the  sum  of  the 
estimated  coefficients  of  the  quadratic  terms.  Whether  or 
not  it  is  larger  than  would  be  expected  by  chance  is 
determined  by  the  magnitude  of  its  ratio  to  the  estimated 
error. 

To  determine  whether  a  second  degree  polynomial  is 
adequate,  tests  can  be  made  in  the  conventional  manner.  The 
sum  of  squares  for  a  lack  of  fit  term  (with  one  degree  of 
freedom)  can  be  obtained  by  subtracting  from  the  total  sum 
of  squares  all  of  the  sums  of  squares  for  linear,  quadratic, 
and  interaction  terms,  along  with  the  sums  of  squares  for 


*  The  investigator  must  also  be  alert  to  the  possibility 
that  within  a  string  two  large  effects  with  opposite  signs 
could  yield  a  trivial  sum. 
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center  points  (error)  and  blocks  —  if  present.  An  F  ratio 
of  overall  lack  of  fit  variance  to  error  variance  can  be 
obtained.  Draper  and  Herzberg  (1971)  describe  a  way  of 
partitioning  this  overall  estimate  into  lack  of  fit  due  to 
interaction  and  that  due  to  cubic  effects  (see  Simon,  1977 a, 
pp  169-171). 

Box,  Hunter,  and  Hunter  (1978,  pp  522-523)  show  a  short¬ 
cut  method  of  testing  the  adequacy  of  the  second -order  model. 
They  point  out  that  "if  the  surface  is  exactly  quadratic  in 
this  direction  (of  a  single  dimension] ,  it  can  be  shown  that 
the  estimate  of  slope  obtained  from  the  axial  points  [of 
that  dimension]  will  be  the  same  as  that  obtained  from  the 
factorial  points  [of  that  dimension]."  The  slope,  m,  of  a 
line  is  equal  to 


Let  us  use  this  to  test  whether  the  second-order  central- 
composite  design  adequately  fits  the  data  or  if  there  is  a 
third-order  component. 

To  obtain  the  slope  of  the  axial  (star)  points  (m  ) ,  we 
substitute: 

y.  -  the  performance  at  axial  point  +a 
i  xi 

y2  =  the  performance  at  axial  (star)  point  - a x 

i 

x1  -  the  coded  value  of  +a 
i  xi 

x2  ■  the  coded  value  of  -ax 

l 

To  obtain  the  slope  for  the  factorial  (cube)  points  (m^) ,  we 
substitute 


*1 


the  average  performance  at  factorial  points 


y_  =  the  average  performance  at  factorial  points 

and 

x^  =  the  coded  value  +1 
x2  »  the  coded  value  -1 
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The  difference  between  the  two  slopes, 

<ma  -  mf) 

supplies  an  estimate  of  the  sum  of  the  third-order  coeffi¬ 
cients,  i.e.  , 


2 


where  y  may  equal  both  i  and  non-i  and  i  equals  each  of  1  to 
K  factors.  In  the  third-order  polynomial  these  are  the 
combined  coefficients  for  third  degree  terms, 

2.3 
x .  x  and  x . . 
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TRANSFORMATIONS  TO  REDUCE  LACK  OF  FIT 

When  the  derived  polynomial  fails  to  fit  the  data  and 
there  are  not  enough  degrees  of  freedom  remaining  to  expand 
the  model  (as  is  the  cage  with  classic  second-order  central 
composite  designs  and  2K  **  screening  designs) ,  the  investi¬ 
gator  should  first  try  to  simplify  the  relationship  by 
transforming  the  data.  This  is  not  a  completely  foreign 
procedure  to  psychologists  who  have  used  logarithmic  trans¬ 
formation  to  linearized  the  relationship  between  psycholo¬ 
gical  and  physical  scales.  If  the  response  surface  of  the 
transformed  data  can  be  approximated  by  a  lower-order  poly¬ 
nomial  than  had  been  possible  with  the  untransformed  data, 
then  there  will  be  no  loss  of  information  and  the  cost  of 
collecting  additional  data  will  have  been  avoided.  The 
investigator  faces  the  task  of  deciding  what  is  the  best 
transformation  to  use. 

Selecting  a  transformation  is  not  a  simple  task.  Too 
often  the  process  has  been  oversimplified  and  treated  in 
cookbook  fashion.  Sometimes  simple  generalizations  are  made 
verbally,  such  as:  "A  reciprocal  transformation  should  be 
used  when  reaction-time  scores  are  involved."  In  other 
cases,  the  arithmetric  relationship  between  the  mean  and  the 
variance  of  the  data  is  used  to  select  the  transformation, 
e.g.,  "When  the  variance  (s2)  equals  the  mean  multiplied  by 
some  constant,  (kM) ,  a  square  root  transformation  should  be 
used".  These  are  monotonic  transformations;  that  is,  they 
handle  relationships  with  "one-bend"  in  them.  Few 
psychologists  ever  consider  "two-bend”  transformations,  the 
one  exception  being  the  arcsin  transformation  for  handling 
percentage  data.  The  most  limiting  feature  of  this  type  of 
treatment  of  transformations  is  that  they  deal  only  with  the 
one- independent ,  one-dependent  variable  case. 

The  selection  of  transformations  becomes  more  compli¬ 
cated  when  there  are  multiple  independent  variables.  If 
their  relationships  with  the  dependent  variables  differ,  we 
will  not  be  able  to  transform  the  response  data  but  must 
find  the  appropriate  transformation  for  each  independent 
variable.  This,  however,  will  destroy  the  orthogonality 
among  the  independent  variables.  Additional  problems  can 
arise  if  the  transformation  destroys  the  normality  of  the 
distribution  and/or  the  homogeneity  of  the  variance.  Luckily, 
many  transformations  that  correct  one  deficiency  in  the 
data  correct  another;  still  it  is  important  to  realize  that 
this  is  not  always  the  case. 

More  sophisticated  treatments  of  data  transformation 
have  noted  that  what  are  often  viewed  as  independent  methods 
of  modifying  the  data,  e.g.,  reciprocal,  logarithmic,  square 
root,  square,  and  others  used  with  monotonic  data,  are 
actually  members  of  a  common  family  that  can  be  represented 
by  a  single  equation,  which  varies  the  "strength"  of  the 
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transformation  as  parameters  of  the  equation  are  changed. 

Box  and  Cox  (1964)  present  one  such  general  transforma¬ 
tion  equation 


w 


log  y 


(X  ¥■  0) 
(X  =  0)* 


where  y  is  the  response  data  and  lambda  (X)  is  the  parameter 
to  be  varied.  As  lambda  takes  on  the  values  from  -1  to  +1 
(and  the  range  can  be  greater) ,  the  equation  becomes  equi¬ 
valent  to  a  number  of  transformations  commonly  employed  by 
psychologists  (see  Table  1) . 


TABLE  1.  RELATION  AMONG  COMMON  MONOTONIC 
TRANSFORMATIONS,  SIGMA-MEAN  RATIOS,  AND  LAMBDA 


Transformation 

Relation 

Lambda 

Application 

Reciprocal 

s  =  kM2 

-1.0 

Reaction  time 

Reciprocal 
square  root 

s  =  kM3/2 

-0.5 

- 

Logarithm 

s  =  kM 

0  * 

Positively  skewed 
data;  sigma 

Square  root 

s2=  kM 

+  0.5 

Frequencies 

No  transfor- 

+1.0 

mation 


*This  is  a  convention  that  allows  a  plot  of  results 
against  lambda  to  be  continuous. 
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A  Practical ,  Empirical  Approach  to  Multivariate  Transformation 


Draper  and  Hunter  (1969)  propose  a  rather  straight¬ 
forward  data-inspection  technique  for  selecting  the  desired 
transformation  of  the  dependent  or  independent  (or  both) 
variables.  Their  procedure  takes  advantage  of  a  high-speed 
computer  to  analyze  the  data  after  different  transformations 
have  been  effected.  The  desired  transformation  would  be 
selected  after  a  visual  inspection  of  the  results  from  these 
several  analyses  are  properly  plotted. 

To  systematize  this  process,  Draper  and  Hunter  made  use 
of  the  equation  for  the  family  of  monotonic  transformations 
developed  by  Box  and  Cox.  A  computer  is  programmed  to 
iteratively  change  the  value  of  lambda  in  the  equation  in 
regular  increments  and  to  perform  an  ordinary  analysis  of 
variance  after  each  transformation.*  The  results  are 
plotted  on  a  graph  with  lambda  values  along  the  abscissa  and 
the  results  of  the  ANOVA  along  the  ordinate.  All  sources  of 
variance  are  plotted  on  the  same  graph.  Since  lambda  can 
take  on  any  value,  the  plots  at  the  different  lambdas  can  be 
connected  into  a  continuous  function.  Given  these  functions 
the  investigator  can  select  the  value  of  lambda  (and  thus 
the  transformation)  that  best  meets  his  requirements. 

To  illustrate  this  procedure,  several  examples  taken 
from  Draper  and  Hunter's  paper  will  be  described  briefly. 

The  reader  is  encouraged  to  refer  to  the  original  paper  for 
additional  details. 

In  their  first  example,  they  show  how  a  transformation 
of  the  dependent  variable  can  be  selected  to  eliminate  an 
interaction.  From  the  data  collected  in  a  two-factor,  3x4 
factorial  design,  mean  squares  and  F-ratios  are  determined 
for  main  effects  A  and  B  and  their  interaction  AB.  The  error 
variance  is  also  obtained.  Using  the  Box  and  Cox  equation. 
Draper  and  Hunter  transform  and  analyze  their  data  using  all 
conditions  of  lambda  between  -2  and  +1.2  in  steps  of  .01. 

Next  they  plot  against  lambda  the  magnitudes  of  the  F- 
ratios**  for  the  three  effects.  They  also  plot  a  measure  of 
"inhomogeneity"  to  see  if  the  evenness  of  the  within-call 
variances  is  disturbed  severely  by  the  transformations  (see 
their  Appendix,  pp  38-39,  for  the  equations  needed  to 


*  There  is  no  reason  why  other  transformations  might 
not  be  employed. 

**  The  F-ratios  should  be  used  whenever  the  dependent 
variable  is  transformed,  since  each  transformation  of  y 
yields  a  different  total  sum  of  squares.  This  is  not  the 
case  when  only  the  independent  variables  are  transformed; 
then  a  plot  of  the  mean  squares  is  appropriate. 
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■  calculate  this  measure) .  The  complete  plot  is  shown  in 

(Figure  1.  Inspection  of  this  figure  reveals  the  lambda  (and 
thus  the  transformation)  that  minimizes  the  interaction, 

,  maximizes  factor  effects,  and  keeps  the  inhomogeneity  within 

*  acceptable  bounds. 

> 

In  this  example.  Draper  and  Hunter  decided  to  consider 
t  inhomogeneity  values  lying  within  the  95  percent  one-tail 

confidence  limits.  This  places  the  candidate  lambda  between 
-1.75  and  -0.53.  The  two  main  effects  are  maximized  at 
-1.35  and  -1.25,  respectively,  and  the  interaction  is  mini¬ 
mized  at  approximately  -0.60.  They  recommend  a  lambda  of 
-1.00  as  a  sensible  compromise.  Inhomogeneity  at  this  point 
is  not  a  minimum,  but  it  is  not  excessively  high.  A  lambda 
of  -1.00  in  the  Box  and  Cox  equation  means  that  a  reciprocal 
transformation  should  be  used. 

In  their  second  example.  Draper  and  Hunter  show  how  they 
selected  a  transformation  to  linearize  the  results  of  a  3^ 
factorial  experiment  analyzed  by  "standard  response  surface 
methods."  In  this  case,  the  total  variance  is  partitioned 
into  linear,  quadratic,  and  residual  variances.  To  find  the 
transformation  of  the  dependent  measure  that  would  enable  the 
three  dimensional  surface  to  be  represented  by  a  first-order 
equation,  they  plot  the  F  ratios  for  the  linear  and  quadratic 
terms  against  values  of  lambda.  They  suggest  that  since  the 
goal  is  to  maximize  the  linear  component  and  minimize  the 
quadratic,  one  might  also  plot  the  ratio  of  the  linear  mean 
square  over  the  quadratic  mean  square  against  lambda.  In¬ 
spection  of  the  graphic  plot  (Figure  2)  shows  that  when  this 
&  ratio  is  maximum,  the  lambda  is  essentially  zero  which  means 

the  desired  transformation  is  logarithmic. 

In  a  third  example.  Draper  and  Hunter  illustrate  how 
this  graphic  method  can  be  used  to  select  a  transformation 
if  both  independent  and  dependent  variables  are  transformed. 
They  state  as  a  general  principle  that  when  multiple 
variables  are  to  be  transformed,  it  is  not  appropriate  to 
transform  the  variables  one  at  a  time,  an  often  used  method; 
instead  all  must  be  transformed  simultaneously  (see  Hill, 
1966) .  The  particular  data  used  for  this  example  was  from 
an  undesigned  experiment.  The  problem  was  to  fit  the  data 
with  the  simplest  form  of  the  regression  model  of  the  form 

E(w)  =  6axa 

where  w  is  the  Box -Cox  transformation  with  X  the  unknown. 

In  this  case  the  equation  was  modified;  (y+l)A  was  substi¬ 
tuted  for  y*.  Since  both  independent  (y)  and  dependent  (x) 
variables  must  be  transformed,  we  must  determine  the  values 
of  two  unknown  transformations,  a  and  A.  The  sum  of  squares 
for  this  model  can  be  partitioned  into  that  due  to  the 
regression,  the  lack  of  fit,  and  pure  error.  We  need  to  find 
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Figure  1.  Plot  to  Select  Transformation  to  Get  Rid  of  an  Interaction 
[From  Draper  and  Hunter  (1969)] 
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the  minimum  ratio  of  the  mean  square  for  the  lack  of  fit  over 
the  error  variance,  that  is,  the  minimum  F-ratio  for  lack  of 
fit. 


Figure  3  shows  a  contour  plot  of  the  F-ratio  (£)  for 
lack  of  fit  plotted  on  a  two-dimensional  graph  of  alpha 
versus  lambda.  Draper  and  Hunter  also  calculate  the  measure 
of  inhomogeneity  for  the  lambda  dimension;  the  vertical  lines 
show  the  acceptable  region  for  this  measure.  The  preferred 
transformation  is  where  the  lack  of  fit  contour  is  at  a 
minimum.  This  is  located  where  lambda  equals  one  (signifying 
no  transformation)  and  alpha  equals  1.5.  Thus  the  simpler 
model  is 

3 

E (y)  =  Bx1 


to  provide  an  additive  representation  of  this  set  of  data. 

Draper  and  Hunter  make  several  general  suggestions 
regarding  the  use  of  this  graphic  plot  technique  for  select¬ 
ing  transformations.  First,  since  there  are  usually  a  band 
of  choices  that  can  be  made,  better  decisions  can  often  be 
made  given  several  sets  of  data  collected  on  the  same 
problem.  Second,  while  the  empirically-derived  transforma¬ 
tions  have  a  pragmatic  value  and  can  be  found,  they  must 
not  totally  divert  attention  from  eventually  developing  a 
theoretically-based  model.  (Simon  says;  That  may  be  easier 
to  do  in  the  physical  than  in  the  behavioral  sciences.] 

Third,  and  most  important,  they  recommend  that  once  a  trans¬ 
formation  has  been  selected  and  the  data  analyzed,  it  be 
subjected  to  a  residual  analysis  (see  Daniel,  1976;  Draper 
and  Smith,  1968;  Box,  Hunter,  and  Hunter,  1978).  This 
residual  analysis  is  first  performed  on  the  transformed  data 
using  the  higher-order  polynomial  model.  Then  if  it  appears 
that  the  simpler  model  is  appropriate,  the  transformed  data 
is  reanalyzed  with  the  higher-order  terms  omitted  and  the 
residual  analysis  is  repeated  on  these  results.  Finally, 
they  suggest  that  a  degree  of  freedom  should  be  removed  from 
the  residual  for  each  parameter  that  is  transformed. 

It  should  be  noted  in  closing  that  transformation  will 
not  simplify  all  forms  of  data.  When  interactions  are  dis- 
ordinal  (or  intrinsic) ,  they  cannot  be  eliminated  by  trans¬ 
forming  the  data.  Transformations  will  only  simplify  ordinal 
interactions  and  curvilinear  effects.  Luckily,  these  occur 
the  most  frequently  in  human  performance  data,  and  so  trans¬ 
formations  do  much  to  reduce  the  need  to  collect  additional 
data  when  lack  of  fit  occurs. 

When  interactions  are  eliminated  by  means  of  data  trans¬ 
formation,  care  must  be  taken  not  to  misinterpret  the  results. 
If  a  critical  interaction  effect,  found  in  the  original 
analysis,  is  no  longer  found  after  the  data  has  been  trans- 
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Acceptable  region  based  on 
measure  of  inhomogeneity 


-1.5  -101  1.5 

A 


Figure  3.  Plot  to  Select  Transformations  when  Changing  Both 
X  and  Y* 

[From  Draper  &  Hunter  (1969)  ] 


*a  is  transformation  for  X,  and  X  is  transformation  for  Y. 
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formed,  there  is  no  contradiction  in  the  true  interpretation 
of  data.  It  is  not  enough  to  say,  as  too  many  psychologists 
do,  that  there  is  or  is  not  a  statistically  significant 
interaction  without  relating  it  to  the  measurement  scale  that 
is  involved.  Results  of  psychological  experiments  are 
"situation  specific",  and  this  includes  the  characteristics 
of  the  data  and  its  analysis  as  well  as  those  of  the  subjects, 
equipment,  environment,  and  time  dimension. 
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AUGMENTATION 


When  transformations  fail  to  simplify  the  data  to  the 
level  at  which  it  can  be  approximated  adequately  by  a  lower- 
order  (i.e.,  1st  or  2nd)  model,  additional  data  must  be 
collected.  There  are  a  number  of  occasions  as  the  sequen¬ 
tial  development  of  the  "new  paradigm"  unfolds  when  the 
investigator  will  be  faced  with  the  problem  of  deciding  what 
augmentation  data  collection  plan  he  must  use. 

There  are  two  occasions  on  which  the  augmentation  design 
is  fairly  stereotyped  and  can  be  derived  rather  mechanically. 
These  are: 

1.  When  the  second  block  must  be  added  to  the 
original  Resolution  III  screening  design  in 
order  to  isolate  main  from  two- factor 
interaction  effects  (see  Box  and  Hunter,  1961; 
also  Simon,  1973,  pp  105-116) . 

2.  When  axial  points  must  be  added  to  the  hyper¬ 
cube  of  a  central-composite  design  in  order  to 
approximate  a  second-order  response  surface 
(see  Box  and  Hunter,  1961;  also  Simon,  1970; 

1973,  pp  131-139)  . 

There  are,  however,  other  occasions  when  more  data  must  be 
collected  but  the  rules  for  selecting  the  data  points  are 
less  well-defined.  The  investigator  in  these  cases  must 
exercise  his  judgment  and  skill.  The  more  important  of 
these  occur  in  three  segments  of  the  research  program. 
Additional  data  may  have  to  be  collected: 

1.  To  isolate  critical  two-  or  three-factor  disordinal 
interactions  from  strings  obtained  from  a 
Resolution  IV  design  during  the  screening  phase. 
This  augmentation  is  necessary  to  avoid  the 

risk  of  ignoring  influential  factors,  the 
effects  of  which  might  be  found  in  their 
interaction  with  one  or  more  other  factors. 

2.  To  isolate  all  critical  two-factor  inter¬ 
actions  aliased  in  strings  of  a  Resolution  IV 
design  when  screening  data  is  to  be  used  to 
complete  the  "hypercube"  portion  of  a  central- 
composite  design.  If  all  critical  two-factor 
interactions  are  isolated  from  main  effects  and 
one  another,  then  the  plan  is  equivalent  to  a 
complete  Resolution  V  design  in  which  all  two- 
factor  interactions  are  isolated  from  main  effects 
and  one  another.  This  equivalence  is  achieved 
with  a  reduced  data  collection  effort.  This 
reduced  design  will  be  referred  to  as  a 
Resolution  V-  design. 
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3.  To  isolate  third-order  terms  when  the  second- 
order  model  does  not  adequately  fit  the 
response  surface.  This  is  necessary  to  minimize 
the  prediction  bias  based  on  an  inadequately 
developed  polynomial. 

So  for  whatever  reason  one  isolates  the  interactions  in 
strings,  more  data  most  be  collected.  The  goal  is  to  do 
this  as  inexpensively  as  possible.  Two  basic  approaches 
are  employed:  1)  To  isolate  all  interactions  within  the 
string,  or  2)  To  attempt  to  "guess"  which  interactions  are 
critical  and  probe  with  a  few  data  points  to  verify  the 
hypothesis.  Which  approach  will  be  used  depends  on: 

1)  A  priori  information  regarding  potentially  important 
interactions;  2)  The  length  ofthe  aliased  strings;  3)  The 
cost  and  time  restrictions  on  further,  data  collection; 

4)  The  precision  with  which  the  effects  must  be  estimated; 
and  5)  The  damage  that  could  occur  if  an  effect  is  neglected. 

Isolating  All  Interactions  within  a  String 

Techniques  for  isolating  all  the  interactions  within  a 
string  have  been  described  by  Box,  Hunter,  and  Hunter  (1961) 
and  Daniel  (1976) .  The  example  that  follows  was  taken  from 
Box,  Hunter,  and  Hunter's  book  (1978,  pp  414-416).  First 
the  method  of  selecting  the  new  experimental  conditions  is 
described;  then  an  example  is  given  to  show  how  the  new  data 
is  combined  with  the  data  from  the  original  block. 

We  shall  assume  that  data  has  already  been  collected  to 
complete  a  2|~4  screening  design  in  which  there  are  eight 
candidate  factors  and  16  experimental  conditions.  The  design 
was  actually  composed  of  a  Resolution  III  design  plus  its 
fold-over.  The  values  of  the  mean,  the  eight  main  effects 
(aliased  with  three-factor  interactions) ,  and  seven  strings 
of  four  two-factor  interactions  are  all  given  in  Table  2. 
Among  the  effects.  Factors  C,  E,  and  H  appear  critical  along 
with  the  two-factor  interaction  string,  (AE) .  This  string 
represents  the  combined  effect  of  the  four  aliased  inter¬ 
actions:  (AE  +  BF  +  CH  +  DG) . 

To  select  the  conditions  required  to  isolate  the  inter¬ 
actions  ,  N  additional  conditions  are  required  to  form  an 
orthogonal  block  for  isolating  N  interactions  in  the  string. 
In  our  example,  N  =  4.  A  pattern  of  signs  in  an  inter- 
actions-by-conditions  matrix  must  be  determined  that  satisfy 
the  orthogonality  requirement.  One  such  pattern  is  shown  on 
the  left  side  of  Table  3.  If  the  interactions  have  these 
particular  signs  for  each  condition,  then  the  main  effects 
must  have  signs  such  as  those  shown  on  the  right  side  of  the 
same  table. 
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TABLE  2.  CONTRIVED  RESULTS  FROM  BLOCK  I  DATA 

ABCDEFGH 
^71  ^71  575  ^71  ^778  :7l  76  T7T~ 

(AB)  (AC)  (AD)  (AE)»  (AF)  (AG)  (AH)  Mean 

-.6  .9  -.4  4.6  -.3  -.2  -.6  19.75 

I  * ( AE)  =  AE  +  BF  +  CH  +  DG] 


TABLE  3.  SELECTING  THE  CONDITIONS  TO 

ISOLATE  INTERACTIONS  IN  STRINGS 

Interactions  to  be  Block  of  conditions  capable 

isolated  of  isolating  the  interactions 

to  left  _ 


AE 

BF 

DG 

CH 

A_ 

B 

C 

D 

E 

F 

G 

H 

1) 

+ 

— 

— 

+ 

1) 

- 

+ 

+ 

+ 

- 

- 

- 

+ 

2) 

+ 

+ 

- 

- 

2) 

- 

+ 

- 

- 

- 

+ 

+ 

+ 

3) 

+ 

- 

+ 

- 

3) 

+ 

+ 

- 

- 

+ 

- 

+ 

4) 

+ 

+ 

+ 

+ 

4) 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

TABLE  4.  SIGN  PATTERN  AND  RESULTS  FOR  CRITICAL 
EFFECTS  OF  THE  AUGMENTATION  BLOCK 


.Critical  Effects....  ....Results. 


M 

C 

E 

H 

AE 

BF 

DG 

CH 

Y 

Y’ 

Y" 

Coefficients 
from  Block  I 

2.75 

-1.9 

.6 

h 

string 

effect  =  2.3] 

Run  17 

+ 

+ 

_ 

+ 

+ 

_ , 

_ 

+ 

29.4 

24.15 

3.2 

Run  18 

+ 

- 

- 

+ 

+ 

+ 

- 

- 

19.7 

19.95 

-1.0 

Run  19 

+ 

- 

+ 

+ 

+ 

- 

+ 

- 

13.6 

17.65 

-3.3 

Run  20 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

24.7 

23.25 

2.3 
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Neither  set  of  signs  are  unique,  but  all  must  satisfy 
the  relationships  found  between  the  main  and  interaction 
effects.  Thus,  if  we  make  the  sign  of  Factor  A  a  minus 
(actually  a  -1) ,  then  in  order  to  have  the  sign  of  AE  inter¬ 
action  a  plus  (actually  +1) ,  it  is  necessary  that  E  also  be 
assigned  a  minus.  Similarly,  if  BF  is  to  have  the  sign  of 
minus  in  the  first  condition,  then  either  B  must  be  plus  and 
F,  minus,  or  vice  versa.  In  this  example,  the  former 
pattern  was  used.  Once  the  signs  have  been  determined  for 
all  of  the  main  effects,  the  experimental  condition  can  be 
identified  by  letters  with  the  plus  signs.  For  the  first 
condition  in  the  example  in  Table  3,  therefore,  with  B,  C, 

D,  and  H  showing  the  plus  sign,  the  experimental  condition 
will  be  designated  bcdh,  indicating  which  factors  are  to  be 
set  at  the  high  (or  +1)  level  for  that  condition. 

To  combine  old  and  new  data  ,  let  us  construct  Table  4. 

We  will  begin  by  calculating  the  coefficients  for  the  Block 
I  (original)  data.  These  are  equal  to  one-half  of  the  value 
of  each  effect.  Thus  the  effect  of  Factor  C  equals  5.5;  its 
coefficient  equals  2.75.  Since  the  effects  have  not  yet 
been  isolated,  we  cannot  know  the  coefficients  for  the 
individual  interactions  in  the  string;  only  the  coefficient 
for  the  combined  string  can  be  determined.  In  this  example, 
it  would  be  one-half  of  4.6,  or  2.3.  These  values  along 
with  the  appropriate  sign  pattern  are  laid  out  for  the  four 
additional  runs,  numbers  17  through  20,  as  shown  in  Table  4. 

To  the  right  of  this  table  are  three  columns,  designated  Y , 

Y' ,  and  Y". 

Column  Y  contains  the  performance  score  obtained  for 
each  of  the  four  new  runs.  (Incidentally,  note  that  for  Run 
20  the  signs  of  the  interactions  in  the  string  are  all  plus, 
corresponding  to  the  levels  in  the  original  string.) 

Column  Y*  contains  the  values  after  the  Y  values  have 
been  corrected  for  the  known  main  effects.  Using  Run  17  to 
illustrate  this  step,  we  substitute  coefficients  and  terms 
in  the  following  equation: 

✓v 

Mean  +  &  QC  +  BeE  +  3  HH  +  B(A£)0.5(AE)  =  Y 

thus 

Mean  +  (2.75) (+1)  +  (-1.9) (-1)  +  (.6)(+l)  +  [0 . 5 (AE) ] (+1)  =  29. 
and  consolidating  these  values,  we  obtain: 

Mean  +  [0.5 (AE)]  =  29.4  -  5.25 
or  Y£7  equals: 

Mean  +  [0.5(AE) ]  =  24.15 
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The  technique  described  above  can  be  used  to  isolate 
three- factor  interactions  in  strings  in  the  same  way  it  was 
used  to  isolate  the  two-factor  interaction  effects  in 
strings. 

Augmenting  Designs  with  Incomplete  Blocks 

The  investigator  may,  for  various  reasons,  not  wish  to 
add  a  complete  block  of  new  data.  In  a  screening  design  or 
the  factorial  portion  of  the  central-composite  design,  he 
may  want  to  use  only  a  few  additional  data  points  to  probe 
and  crudely  test  his  hypothesis  that  a  particular  inter¬ 
action  within  a  string  is  accounting  for  most  of  the  observed 
variance  within  the  string.  Too  much  "guessing"  of  this 
sort  can  turn  out  to  be  more  expensive  than  a  more  formal 
approach,  but  it  must  be  considered  as  a  viable  alternative 
when  there  are  a  great  many  effects  within  strings  and  when 
the  clues  are  available  and  strong  (Simon,  1973,  pp  112-115) . 

In  the  case  of  a  response  surface,  an  investigator  may 
wish  to  add  some  data  points  where  the  slope  of  the  surface 
is  steepest,  and/or  changing  rapidly,  to  improve  the  pre¬ 
cision  of  the  estimates.  Or  he  may  wish  to  add  star  points 
in  one  corner  of  the  original  fractional  factorial  plan  to 
develop  a  non-central  composite  design. 

In  both  cases,  adding  only  a  few  points  will  often 
destroy  the  orthogonality  of  the  design,  a  condition  that  is 
more  important  in  the  identification  phase  than  in  the 
response  surface  phase  of  the  research.  In  the  latter  case, 
we  have  little  interest  in  individual  factors  or  their 
effects  and  are  concerned  primarily  with  the  overall  shape 
of  the  multifunction  surface.  A  regression  analysis  can  be 
used  to  incorporate  the  new  data  into  the  earlier  data.  If 
it  is  important,  a  few  additional  points  might  be  employed 
to  improve  the  orthogonality  using  the  technique  proposed 
by  Dykstra  (1971;  also  see  Simon,  1975,  pp  26-30) . 

When  new  conditions  are  tested  at  periods  of  time  far 
removed  from  that  when  the  original  data  was  collected,  one 
must  be  careful  of  shifts  in  the  performance  level  due  to 
external  irrelevant  sources.  Orthogonal  blocking  techniques 
would  ordinarily  be  employed  to  handle  these  situations. 

Where  complete  blocks  are  not  involved,  however,  we  might 
follow  Hebble  and  Mitchell’s  (1972,  p  771)  suggestion  of 
using  a  blocking  term  in  the  regression  equation  which 
already  has  a  constant,  BQ,  in  the  model.  A  "dummy  variable" 
is  created  by  assigning  a  zero  value  to  each  condition  in 
the  original  design  and  a  value  of  one  to  the  new  conditions. 
Regression  techniques  would  be  used  to  analyze  the  combined 
data.  When  this  is  done,  the  model  for  the  original  design 
is  unchanged  by  the  introduction  of  the  blocking  variable. 
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Values  were  not  substituted  for  the  Mean  nor  for  the 
coefficient  of  the  (AE)  string;  this  latter  remains  rep¬ 
resented  by  the  value,  0.5(AE),  or  one-half  the  (AE)  effect, 
multiplied  by  the  appropriate  plus  or  minus  term,  which  in 
the  case  of  Run  17,  is  4-1. 

The  new  mean  is  calculated  using  Run  20  Y',  where  the 
signs  are  all  plus.  Substituting  in  the  following  equation: 

Mean  +  0.5(AE){+1)  =  Run  Y'20 

thus 

Mean  +  2.3  =  23.25 
or 

Mean  =  23.25  -  2.3  =  20.95. 

Column  Y"  in  Table  4  is  obtained  by  subtracting  the 
Mean  value  from  each  value  in  Column  Y ' . 


The  values  in  Column  Y"  can  be  used  to  estimate  the 
effects  of  the  individual  interactions,  either  by  summing 
them  according  to  the  sign  matrix  for  each  interaction  or 
using  Yates'  algorithm  (see  Simon,  1977a).  The  sign  pattern 
for  the  interactions  shows  that  they  are  ordered  according 
to  Yates*  standard  order.  Thus  the  calculations  with 
Yates'  algorithm  would  look  like  this  with  the  values  in 
Column  Y" : 


Run  # 

Y"  Data 

Step 

#1 

Eff ects- 
Sum 

Effects 

Effect 

Name 

17 

3.2 

2.2 

1.2 

.6 

AE 

18 

-1.0 

-1.0 

1.4 

.7 

BF 

19 

-3.3 

-4.2 

-3.2 

-1.6 

DG 

20 

2.3 

5.6 

9.8 

4.9 

CH 

It  is  apparent  that  interaction  CH  accounted  for  most  of  the 
observed  effect  of  the  string  and  that  interactions  AE  and 
BF  are  probably  trivial. 
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It  is  difficult  to  conceive  of  a  data  collection  plan 
that  can' t  be  orthogonally  blocked  to  some  extent  since 
pairs  of  observations  can  be  selected  to  be  some  fraction  — 
however  small  —  that  is  an  orthogonal  block  to  the  original 
set  of  data.  Dykstra  (1966,  p  279)  suggested  that  the 
coordinates  of  each  pair  of  new  observation  points  should 
equal  the  average  of  the  original  block. 
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SECTION  III 

USING  YATES'  ALGORITHM  WITH  SCREENING  DESIGNS 


Yates'  algorithm  (Simon,  1977a,  pp  66-71;  Davies,  1967, 
pp  263-265;  Box,  Hunter,  and  Hunter,  1978,  pp  323-324;  Yates, 
(1937)  provides  a  convenient  way  of  analyzing  2*  and  2*~P 
designs  with  or  without  the  use  of  a  computer.  With  screen¬ 
ing  and  other  fractional  factorial  designs,  the  use  of  this 
algorithm  is  complicated  because  of  the  requirement  to  list 
conditions  and  the  effects  in  Yates’  standard  order  since 
with  these  designs  all  conditions  of  the  factorial  are  not 
used  and  the  experimental  effects  are  aliased  in  sets. 

This  complication  is  even  more  evident  in  the  case  of  Simon's 
screening  designs  robust  to  trends  (Simon,  1977a,  pp  13-24) . 
How  Yates'  algorithm  is  used  to  analyze  a  trend-robust 
screening  design  is  explained  below. 

DETERMINING  THE  "STANDARD  ORDER"  OF  THE  EXPERIMENTAL 
CONDITION 

In  Table  5,  an  example  of  Simon's  trend-robust 
screening  design  is  shown,  i.e.,  a  Resolution  IV  design. 

To  use  Yates'  algorithm,  the  experimental  conditions  must  be 
listed  in  the  "Standard  Order":  (1) ,  a,  b,  ab,  c,  ac,  be, 

abc,  d,  and  so  forth.  The  conditions  in  Table  7,  however,  are 
aefg,  bedh,  befg,  and  so  forth.  How  do  we  reconcile  the  two 
lists  and  place  the  conditions  used  in  the  experiment  in  the 
standard  order? 


This  is  accomplished  by  ignoring  the  new  screening  de¬ 
sign  labels  and  corresponding  names  for  the  experimental 
conditions  (e.g.,  aefg,  bedh , . . .  ( 1 ) ,  abcdefgh)  and  think  in 

terms  of  the  original  factor  labels.  Remember,  every  frac¬ 
tional  factorial  has  the  same  sign  matrix  as  a  complete 
factorial  for  fewer  factors.  Thus  this  28-4  design  has  the 
same  sign  matrix  as  a  24  factorial,  although  in  this  case 
the  columns  have  been  rearranged.  To  find  the  names  of  the 
conditions  for  the  original  design,  we  first  find  columns 
representing  original  factors  A,  B,  C,  and  D.  These  are 
easy  to  identify  by  the  alternating  -+  pattern  for  A,  --++ 

pattern  for  B,  - ++++  pattern  for  C,  and  so  forth.  These 

columns  are  used  to  reconstruct  the  original  names  of  the 
conditions.  For  example,  take  new  experimental  conditions, 
aefg,  the  first  row  of  the  matrix.  Look  for  for  the  signs  in 
that  row  associated  with  original  factors  A,  B,  C,  D,  which 
in  this  design  are  -,  -,  -,  respectively.  This  new  aefg  is 

therefore  the  original  ( 1 )  condition.  Take  row  two  with 
the  new  condition  label,  bedh ,  and  find  the  signs  in  that 
row  for  original  factors  A,  B,  C,  D.  This  time  the  signs 
are  +,  -,  -  which  is  original  experimental  condition  a. 

This  process  of  finding  the  original  condition  labels  from 
the  signs  associated  with  the  original  factor  labels  would 
continue  until  complete.  (Note:  While  in  this  example,  the 
conditions  turn  out  to  be  in  the  standard  order  as  listed. 
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TABLE  5.  28"4  TREND  RESISTANT  SCREENING  DESIGN 


TEST 

ORDER 

EXPERIMENTAL 

CONDITION 

(I) 

tUEJrf  SJLB.E  LfLI  ILG 
(Main  Effects)* 

A  B  C  D  E  F 

D_Li_ 

G  H 

lii  LABELS  * 

(Two-Factor  Interaction  Strings)** 

AH  A6  AF  AE  AD  AC  AB 

1 

AEFG 

4 

4 

- 

- 

4 

4 

4 

- 

- 

4 

4 

4 

- 

- 

- 

2 

BCDH 

4 

- 

4 

4 

4 

- 

- 

4 

- 

4 

4 

4 

- 

- 

- 

3 

BCFG 

4 

.  - 

4 

4 

- 

- 

4 

4 

4 

- 

- 

4 

4 

- 

- 

4 

ADEH 

4 

4 

4 

- 

4 

4 

- 

- 

4 

4 

- 

- 

4 

4 

- 

- 

5 

BDFG 

4- 

- 

4 

- 

4 

4 

- 

4 

- 

4 

- 

4 

- 

- 

4 

- 

« 

ACFH 

♦ 

4 

- 

4 

- 

- 

4 

- 

4 

4 

- 

4 

- 

- 

4 

- 

7 

ACDG 

4 

4 

- 

4 

4 

- 

- 

4 

m 

- 

4 

- 

- 

4 

4 

- 

B 

BEFH 

4- 

- 

4 

- 

- 

4 

4 

- 

4 

* 

4 

- 

- 

4 

4 

- 

9 

C0Fc 

4- 

- 

- 

4 

4 

4 

4 

- 

4 

4 

- 

•- 

• 

- 

4 

10 

ABGH 

4 

4 

4 

- 

- 

- 

4 

4 

4 

4 

- 

- 

- 

- 

44 

11 

AEDF 

4 

4 

4 

- 

4 

- 

4 

- 

- 

- 

- 

4 

- 

4 

- 

12 

CEGH 

4 

- 

- 

4 

- 

4 

- 

4 

4 

- 

- 

4 

- 

4 

.  - 

4 

13 

AECE 

4 

4 

4 

4 

- 

4 

- 

- 

- 

- 

- 

- 

4 

- 

4 

4 

14 

DFGH 

4 

- 

- 

- 

4 

- 

4 

4 

4 

- 

- 

- 

4 

- 

4 

4 

15 

(1) 

4 

- 

- 

- 

- 

- 

- 

- 

- 

4 

4 

4 

4 

+ 

4 

4 

16 

_ ABCIIFFGH _ 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4 

4* 

4 

4 

ORIGINAL  FACTORIAL  LABELS 

5 

> 

DO 

° 

r~> 

CO 

CO 

DO 

s 

§ 

*» 

CO 

!r? 

CO 

CO 

CO 

O 

CO 

*Three-f actor  interaction  strings  aliased  with  main  effects. 
**Two-factor  interaction  strings  aliased  with  other  two-factor 
interactions . 


this  need  not  always  be  the  case  and  the  investigator  must 
determine  it  for  each  fractional  factorial.) 

APPLYING  YATES'  ANALYSIS  TO  THE  DATA 

In  accordance  with  Yates'  algorithm,  the  performance 
scores  associated  with  each  original  label  condition  are 
listed  in  the  standard  order  and  the  analysis  is  performed. 
Each  effect  or  coefficient  derived  from  the  analysis  is 
then  identified,  i.e.,  Mean,  A,  B,  AB,  C,  AC,  BC,  ABC,  D, 
etc.  by  listing  the  original  factor  labels  in  standard  order. 

SUBSTITUTING  NEW  EFFECT  LABELS  FOR  ORIGINAL  EFFECT  LABELS 

We  relate  the  new  labels  in  the  Table  5  design  to  the 
original  labels  simply  by  replacement.  In  our  example,  we 
replace  original  factor  label  A  with  new  factor  label  H, 
replace  B  with  AD,  AB  with  E,  C  with  AC,  AC  with  F  and  so 
forth . 
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SECTION  IV 
ANALYZING  RESIDUALS 


When  economical  multifactor  designs  are  used,  the 
investigator  needs  all  the  help  he  can  muster  to  make  certain 
that  he  is  interpreting  the  data  properly.  Residual  analysis 
is  useful  for  this  purpose. 

Residuals  are  the  unexplained  variance  in  experimental 
data.  A  residual  is  the  difference  between  an  observed  and 
a  predicted  score,  the  prediction  being  obtained  from  the 
regression  equation  derived  from  the  data  itself.  Ideally, 
residuals  have  a  zero  mean,  a  normal  distribution,  a  constant 
variance,  and  are  independent  of  one  another.  Too  much 
deviation  from  these  ideals  can  distort  the  true  interpreta¬ 
tion  of  results  based  on  analyses  in  which  these  conditions 
are  assumed  to  be  true.  Inspection  of  the  residuals  can 
help  the  experimenter  evaluate  and  interpret  his  data  and 
decide  what  the  next  step  in  the  analysis  should  be.  Tech¬ 
niques  for  analyzing  residuals  have  been  discussed  by 
Anscombe  and  Tukey  (1963) ,  Box  and  Cox  (1964),  Draper  and 
Smith  (1968),  Daniel  and  Wood  (1971),  Wood  (1973),  Daniel 
(1976),  and  Box,  Hunter,  and  Hunter  (1978). 

CALCULATING  RESIDUALS 

While  predicted  values  may  be  determined  individually 
by  the  direct  use  of  the  regression  equation,  use  of  the 
reverse  Yates'  algorithm  is  perhaps  the  quickest  way  to 
perform  this  task  when  a  2*  or  2*  ^  experimental  design  has 
been  employed.  This  technique  has  been  described  by 
numerous  authors  (Box,  Hunter,  and  Hunter,  1978,  p  344; 
Daniel,  1976,  p  19;  Hunter,  1966,  Simon,  1977a,  pp  78-79). 
Each  predicted  performance  value  is  then  subtracted  from  the 
observed  perforniance  value  of  each  corresponding  experi¬ 
mental  conditions  to  obtain  the  residuals. 

APPLICATIONS  OE  RESIDUAL  ANALYSIS 

Residual  analysis  may  be  used  to: 

1 .  Check  how _we  1 1  tho_  data  moe t s_  the  assumptions  of 
normality  and  zero  mean.  A  frequency  plot  of  the 
residuals  or  an  ordered  plot  on  normal  probability 
paper  (see  item  4  below)  will  indicate  how  close 
the  residuals  are  to  being  normally  distributed 
with  zero  mean.  Since  finite  data  will  show 
variations  from  the  ideal,  some  experience  is 
needed  to  interpret,  these  plots.  Daniel  and 
Wood  (1971,  Appendix  JR)  and  Daniel  (1976,  Appendix 
6A)  illustrate  how  such  fluctuations  might  appear 
if  due  to  chance  alone. 
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2 .  Check  how  well  the  data  meets  the  assumptions  of 
homogeneity .  A  r’jt  of  the  residuals  against  the 
estimated  values  -r  the  independent  variables  can 
reveal  when  the  variance  is  not  constant,  being 
larger  for  some  variables  than  others.  When  there 
are  too  many  independent  variables,  graphic  plot6 
will  be  difficult  to  interpret  unless  subsets 

of  variables  are  examined. 

3 .  Calculate  the  precision  of  the  fitted  estimate. 

The  residual  variance  is  considered  to  be  the 
"error"  variance  of  both  the  ANOVA  and  regression 
models.  The  residual  variance  is  calculated  by  ^ 
dividing  the  residual  sum  of  squares,  (Y^  -  Y.)  , 

by  (N-K-l) ,  where  Y.  is  the  observed  response  and 
Y-  is  the  predicted  response  at  condition  i,  N  is 
the  total  number  of  observations,  and  K  is  the 
number  of  factors  being  isolated.  The  standard 
deviation  is  the  square  root  of  that  value;  it  can 
be  used  to  calculate  confidence  limits.  The 
standard  error  of  the  statistic  is  the  standard 
deviation  divided  by  the  square  root  of  (N-K-2) . 

If  the  ratio  between  the  residual  sum  of  squares 
and  total  sum  of  squares  is  subtracted  from  one, 
the  result  equals  the  multiple  correlation  squared, 
or  the  proportion  of  variance  accounted  for  by 
the  terms  of  the  regression  equation. 

4 .  Detect  outliers  and  errors  in  data  collecting, 
recording,  and/or  calculating.  Ideally  residuals 
should  be  distributed  normally  with  a  mean  of 

zero.  In  an  ordered  plot  on  normal  probability  paper, 
the  i-th  value  from  the  bottom,  z^,  is  plotted 
against  a  value,  a.  ,  *,  chosen  to  be  typical  of 
the  i-th  value  froim  ?he  bottom  in  a  sample  of  n 
from  a  unit  normal  distribution.  When  the  plot  fails 
to  follow  a  straight  line,  deviant  individual 
measures  may  suggest  that  something  unusual  has 
occured  and  should  be  explained. 


*The  value  of  a.  ,  represents  the  probability  point  on 
the  normal  probabiliiyscale  for  the  i-th  position  of  n  data 
points.  It  has  been  calculated  in  different  ways.  Daniel 
(1978)  used  the  equation 

a±/n  ~  (i  “  0.5)/n 

While  Anscombe  and  Tukey  (1963,  p  145)  suggest  that 

a.  .  -  (3i  -  1)  /  (3n  =  1) 

i/  n 

is  a  "shade  better"  calculation.  For  n  =  15,  this  would 
locate  the  12th  point  at  the  .767  probability  point  using 
Daniel's  calculation  and  .761  using  Anscombe  and  Tukey's. 
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Anscombe  and  Tukey  propose  plotting  a  variation 
on  the  usual  graphic  plot  of  residuals,  in  order 
to  discover  outliers  and  errors.  They  write  (p  145): 
"If  z  represents  the  median  of  the  z's,  plotted  at 
(0,  £) ,  then  the  slope  of  the  secant  leading  from 
(0,  z)  to  (a.jyn,  z/)  is 


and  may  usefully  be  calculated  for  i's  sufficiently  far 
from  the  median.  (Omitting  the  middle  third  of  the  i's 
seems  satisfactory.)  An  exactly  straight  line  for  the 
cumulative  on  normal  probability  paper  corresponds 
to  exactly  constant  values  for  these  slopes.  A  plot 
of  (z.  -  z)/a.  ,  against  i  is  thus  quite  revealing..." 
By  way  of  illustration,  they  suggest  that  the  plot 
would  show  the  following  characteristics  for  each 


specific  perturbation  of  t 
DATA 

•  Single  outlier 

•  Number  of  outliers 
of  each  sign, 

or 

Symmetrical  distribu¬ 
tion  with  more  extended 
tail  than  normal 

•  Tendency  to  skewdness 


data : 

PLOT 

Large  value  for  i  =  1 
or  n,  depending  on 
sign  of  outlier 

Plots  turn  up  at 
both  ends 


Plot  higher  at  one 
end  than  the  other 


The  presence  of  an  outlier  may  mean  that  an  error 
has  occurred  or  it  may  mean  that  an  unusual  but 
valid  event  has  occurred.  At  best,  the  residual 
analysis  alerts  the  investigator  and  encourages 
him  to  search  further  for  an  explanation.  Anscombe 
and  Tukey  warn  that  with  residual  analysis  it  is 
relatively  easy  for  one  kind  of  misbehavior  of  the 
data  to  simulate  another.  For  example,  they  note 
(p  143)  that  "a  single  very  wild  value  will  (i)  act 
like  an  outlier,  (ii)  lead  to  very  non-normal  values 
of  estimated  shape  coefficients,  (iii)  indicate 
enhanced  variability  for  each  of  the  subgroups  to 
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which  it  belongs,  (iv)  appear  to  constitute  a 
measurable  dependence  of  variability  of  response 
upon  level  of  response."  They  also  emphasize 
the  ease  with  which  the  strength  of  the  evidence 
offered  by  a  single  plot  may  be  overrated  and  they 
encourage  the  search  for  repeated  occurances  of 
a  residual  pattern  from  several  bodies  of  data. 

Anscombe  and  Tukey  also  point  out  the  dangers  of 
applying  numerical  procedures  to  data  before  outliers 
are  properly  dealt  with.  Removal  may  be  the  best 
way  to  handle  erroneous  data,  either  analyzing  the 
remaining  data  with  the  values  missing  or  assigning 
modified  values  in  their  places.  The  circumstances 
and  the  purpose  of  the  experiment  help  determine 
the  best  way  to  handle  outliers . 

Help  decide  when  to  stop  adding  terms  in  a  cumulative 
model.  The  results  from  an  unreplicated  screening 
design  may  be  ordered  according  to  the  magnitude  of 
the  effects  of  each  source  of  variance  and  a 
cumulative  proportion  of  variance  can  be  assigned  to 
each  succeeding  value  (Simon,  1977a,  pp  75-83;  1979, 
p  41) .  The  investigator  must  decide  where  to  draw 
the  line  between  effects  that  probably  are  critical 
and  those  that  are  not.  If  residuals  are  obtained 
using  the  subset  polynomial  and  no  unusual  patterns 
are  observed  and  a  test  of  the  lack  of  fit  finds 
that  the  model  adequately  represents  the  response 
surface,  the  investigator  may  stop  including  terms 
beyond  that  point. 

Indicate  the  adequacy  of  the  model.  If  "pure"  error, 
as  obtained  from  repeated  measures  at  the  center  of  a 
central-composite  design,  is  isolated  from  the 
residual  variance,  the  remaining  variance  represents 
the  degree  to  which  the  estimated  model  fails  to 
fit  the  data.  When  residuals  are  plotted  against 
the  predicted  values,  the  presence  of  a  linear  or 
quadratic  trend  suggests  that  a  data  transformation 
is  needed  or  that  additional  terms  must  be  added  to 
the  model. 

Identify  terms  contributing  to  large  interaction 
effects .  Valid  three-factor  interactions  are 
infrequent  in  human  performance  data  (Simon, 

1977a) .  Valid  four-factor  or  higher  interactions 
are  so  unlikely  that  if  their  effects  are  large, 
they  should  be  suspect.  Quite  often  a  large 
higher-order  interaction  may  indicate  the  presence 
of  an  aberrant  observation.  An  examination  of  the 
residuals  when  a  questionable  term  is  excluded  from 
the  fitted  model  will  often  indicate  which  condi¬ 
tions  are  responsible. 
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Hunter  (1966)  gives  an  example  to  show  how  residual 
analysis  may  facilitate  the  interpretation  of  such 
an  interaction.  In  a  24  factorial  design,  two  of 
the  factors,  their  two-factor  interaction,  and  the 
four-factor  interaction  showed  large  effects.  The 
inverse  Yates'  algorithm  was  used  to  determine  the 
predicted  performance  when  the  model  included  only 
the  coefficients  of  the  two-factors  and  their 
interaction  with  all  other  coefficients  set  equal 
to  zero,  including  the  non-trivial  one  associated 
with  the  four-factor  interaction  effect.  When  the 
residuals  were  obtained  for  each  term  of  the  24 
factorial,  one  condition  was  found  to  show  an 
unusually  large  response  when  compared  with  the 
estimated  standard  deviation  for  the  data.*  It  was 
discovered  that  the  large  respon-se  was  due  to  a 
recording  error.  When  it  was  corrected,  the 
aberrant  four-factor  interaction  disappeared. 

8.  Reveal  distortions  introduced  by  external  sources  of 
variance.  Residuals  should  be  uncorrelated  with 
any  external  source  of  variance.  If,  for  example, 
residuals  are  plotted  as  a  function  of  time,  we  do 
not  expect  to  find  that  they  increase  or  decrease  in 
magnitude,  become  more  or  less  variable,  or  even 
show  curvilinear  relationships  over  time.  The 
presence  of  any  of  these  patterns  suggests  that 
the  data  be  transformed  to  eliminate  the  effect  or 
that  other  terms  be  added  to  the  model  to  account 
for  time  effects. 

If  the  investigator  is  concerned  that  his  experi¬ 
mental  data  may  be  distorted  by  uncontrolled  and 
unplanned  changes  that  took  place  in  the  environ¬ 
ment  when  the  data  was  being  collected,  he  may  com¬ 
pare  the  residuals  associated  with  the  different 
conditions  to  see  if  they  show  corresponding  changes. 

Daniel  and  Wood  (1971,  p  32)  state  that  plotting  techniques 
can  not  be  expected  to  work  well  with  less  than  20  observations, 
and  are  more  efficient  when  the  number  exceeds  fifty.  Still,  an 
investigator  may  use  them  to  enhance  the  effectiveness  of  his 
visual  inspection  of  the  data  prior  to  formal  analysis,  provided 
he  does  so  with  restraint. 


*In  this  example,  the  standard  deviation  was  obtained 
by  corbining  the  sum  of  squares  for  the  eleven  terms  of  the 
factorial  that  were  considered  trivial  (i.e.,  all  except  the 
two  factors,  their  interaction,  and  the  four-factor  inter¬ 
action)  and  dividing  by  N-K-l  (or  11  in  this  example)  and 
taking  the  square  root  of  the  results. 


F  V 

$ 


NAVTRAEQUIPCEN  78-C-0060-3 
SECTION  V 

IDENTIFYING  THE  EXPERIMENTAL  CONDITIONS  IN  2k_P  DESIGNS 
WHEN  GIVEN  THE  DEFINING  GENERATORS 

At  times,  2k  p  fractional  factorial  designs  are  described 
only  in  terms  of  the  defining  generators.*  An  investigator 
may  wish  to  know  what  experimental  conditions  make  up  the 
completed  design.  This  information  is  needed  to  run  an  experi 
ment  or  if  a  design  is  to  be  fractionated,  to  identify  the 
conditions  to  be  used  without  needing  to  construct  a  complete 
sign  matrix. 

To  illustrate  this  process,  let  us  begin  with  the 
following  set  of  defining  generators: 

I,  ABC,  ADE. 

Generators  are  independent  of  one  another  and  cannot  be 
obtained  by  multiplying  together  other  generators.  Contrasts 
are  obtained  by  multiplying  ever  factorial  combination  of  the 
defining  generators  by  one  anotfn  '.  In  this  case,  with  two 
generators,  there  is  only  one  ado. tional  combination.  The 
complete  set  of  defining  contrasts  here  would  be: 

I  =  ABC  =  ADE  =  BCDE . 


This  set  of  defining  contrasts  describes  a  quarter  repli¬ 
cate  of  a  2^  design  containing  five  factors.  A,  B,  C,  D,  and  E 
The  design  requires  25-2  =  2-*  =  h  experimental  conditions.  It 
is  a  Resolution  III  design,  in  which  all  main  effects  can  be 
isolated  from  one  another  but  not  from  two-factor  and  higher- 
order  interactions. 


To  form  different  fractions  (i.e.,  blocks  of  the  total 
factorial) ,  the  design  generators  may  be  assigned  a  positive 
or  a  negative  sign.  The  signs  of  the  complete  set  of  defining 
contrasts  will  follow  the  usual  arithmetric  rules  for  multi¬ 
plying  values  with  the  same  or  different  signs.**  Thus: 


♦Readers  who  are  unfamiliar  .vith  the  terms  or  basic 
mechanics  discussed  in  this  paper  are  referred  to  Box  and 
Hunter  (1961),  Davies  (1967)  or  Simon  (1973). 

**0f  course,  we  aren't  really  multiplying  signs;  we  are 
multiplying  coefficients  of  plus  or  minus  one.  We  drop  the 
one  to  make  the  tables  less  confusing. 
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Generators 


Blocks 


Remaining  Contrast 


-(which  is  the  product 
of  two  plus  signs) 
-(which  is  the  product 
of  a  plus  &  minus  sign) 


I  ABC  ADE  I  BCDE 

1)  +  +  +  I  + 

2)  +  + 

[ 

3)  +  + 

4)  +  - 

These  contrasts  are  actually  read  as  a  string  of  effects 
all  aliased  with  the  Identity 


Block  1  = 
Block  2  = 
and  so  forth. 


(I  +  ABC  +  ADE  +  BCDE) 
(I  +  ABC  -  ADE  -  BCDE) 


From. these  we  can  easily  determine  the  aliases  for  each 
effect  in  any  block.  For  example,  in  Block  Is 

(A  +  BC  +  DE  +  ABCDE)^ — (The  product  of  A 
(B  +  AC  +  ABDE  +  CDE)  and  the  identity 

(AB  +  c  +  BDE  +  ACDE)  string  for  Block  1) 

and  so  forth.  For  Block  2,  the  string  of  aliases  for  B 
would  be,  for  example: 

(B  +  AC  -  ABDE  -  CDE) . 

To  ,find  the  experimental  conditions  in  a  particular 
block,  we  must  assign  signs  to  each  factor  (in  this  case, 
to  A,  B,  C,  D,  and  E)  in  combinations  that  produce  the 
correct  signs  for  the  contrasts  given  for  the  particular 
block.  Let  us  use  Block  3  to  illustrate  this,  where 

Block  3  =  (I  -  ABC  +  ADE  -  BCDE) . 

Prepare  at  least  one  column  for  each  factor.  Sometimes 
it  helps,  but  is  not  necessary  and  may  even  be  awkward  as 
the  number  of  defining  contrasts  increases,  to  group  them  as 
they  appear  in  the  contrasts.  Thus  we  might  head  each 
column  as : 


or 

or 


1)  A  B  C  D  E 

2)  l& _ B _ Qi  iA  D  E. 

3)  iB  C  Ai  D  E  ABC 


L 


_|-  ADE 
J  BCDE 


l_B _ £ _ p  El 
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To  illustrate  how  we  build  the  sign  matrix,  let  us  use  the  second 
layout  where  each  contrast  has  its  own  columns.  We  must  develop 
eight  independent  experimental  conditions.  The  steps  below  show 
how  the  conditions  in  Table  6  were  derived.* 

Starting  with  contrast  ABC,  which  for  Block  3  has  a  minus 
sign  associated  with  it,  we  assign  signs  to  A,  B,  and  C  individ¬ 
ually  so  that  their  product  will  always  equal  minus.  This  means 
that  there  must  be  an  unequal  number  of  minus  signs  assigned,  i.e., 
either  one  or  three.  The  four  possible  combinations  for  A,  B,  and 

C,  that  meet  this  criterion  are  -++,  +-+,  ++-,  and  - .  Next  we 

assign  signs  to  A,  D,  and  E  to  create  a  positive  ADE  contrast.  The 
available  selections  are  restricted  by  the  signs  already  assigned 
to  A.  For  this  contrast,  since  ADE  is  assigned,  a  plus  sign  in  this 
Block,  there  must  be  an  equal  number  (i.e.,  two)  of  minus  signs  in 
every  combination.  Thus,  if  A  is  already  minus,  then  there  are  two 
possible  combinations  for  D  and  E.  One  would  be  +,  -  respectively 
and  the  other  would  be  -,  +.  When  A  is  plus,  then  D  and  E  must  be 
either  ++  or  — .  Having  filled  these  two  contrasts  the  columns 
for  B,  C,  D,  and  E  are  already  determined.  If  the  first  two  con¬ 
trasts  have  been  done  correctly,  the  third,  BCDE  with  a  minus  sign, 
will  also  be  correct,  with  an  odd  number  of  minuses  (1  or  3)  each 
time.  The  combined  signs  are  shown  in  Table  6. 


1) 

2) 

3) 

4) 

5) 

6) 

7) 

8) 


TABLE  6.  BUILDING  THE  SIGN  MATRIX  &  IDENTIFYING  CONDITIONS 
Factor  Headings  (Grouped  by  Contrast) 


)  +(A 


-  (B  C  D  E) 

+  +  + 

+  +  -  + 

-  +  +  + 
-  +  - 


+ 

+ 


+  + 


-  + 


Block  3 
Conditions 


bed 

bee 

aede 

ac 

abde 

ab 

d 

e 


1) 

2) 

3) 

4) 

5) 

6) 

7) 

8) 


♦It  is  only  necessary  to  assign  the  signs  to  all  of  the  defining  generators . 
When  this  is  done  properly,  the  remaining  defining  contrasts  will  be  correct. 

When  the  fractional  factorial  is  small,  writing  out  all  of  the  defining  contrasts 
could  become  burdensome  for  this  purpose. 
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The  final  step  is  to  use  the  sign  matrix  to  name  the 
experimental  conditions  that  make  up  the  fractional  factorial 
for  this  particular  Block  3.  We  do  this,  of  course,  a  row 
at  a  time  by  citing  the  letters  of  the  factors  with  a  plus 
sign.  Thus  within  the  boxes  surrounding  the  five  factors  in 
the  table,  we  find  on  the  first  row,  A,  B,  C,  D,  and  E  have 
the  signs  -,  +,  +,  +,  -,  respectively.  Since  Factors  B,  C, 
and  D  were  given  plus  signs  in  that  row  we  name  the  first 
experimental  condition  bed.  The  names  of  the  other  condi¬ 
tions  are  shown  in  the  right-hand  column  of  the  table. 
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SECTION  VI 

AN  ECONOMICAL  DESIGN  FOR  SCREENING  INTERACTION  EFFECTS 

DeGray  (1968)  proposes  an  in  2resting  economical,  multi¬ 
factor  design  for  identifying  a  certain  type  of  interaction 
response  during  the  preliminary  and  screening  phases  of  a 
research  program.  This  type  of  interaction  is  characterized 
by  a  pattern  of  responses  in  which  nothing  of  consequence 
happens  as  each  experimental  condition  is  tested  until  a 
particular  combination  of  factor  levels  are  combined;  then  a 
major  response  occurs.  He  refers  to  this  type  of  interaction 
as  "coactive." 

DeGray  gives  the  following  example  to  show  the  effect  to 
be  expected.  A  spark  is  passed  through  four  chemical 
atmospheres  composed  of  the  factorial  combinations  of  the 
presence  or  absence  of  two  chemicals,  hydrogen  and  oxygen, 
i.e. , 

(1)  None 

a  Hydrogen  only 
b  Oxygen  only 

ab  Mixture  of  hydrogen  and  oxygen. 

A  coactive  interaction  is  in  icated  when  nothing  happens 
in  the  first  three  tests  and  an  explosion  occurs  in  the 
last.  If  the  magnitudes  of  these  responses  are  designated 
0,  0,  0,  and  1000  respectively,  an  analysis  of  variance 
would  reveal  the  effects  of  the  three  sources  of  variance  to 
be 

A  500 

B  500 

AB  500 

When  the  effect  of  the  interaction  term  is  approximately  the 
same  size  as  the  main  effects,  as  in  this  example,  the 
presence  of  a  "coactive"  interaction  is  indicated.  The 
results  from  the  analysis  of  variance  tends  to  cloud  the 
true  implications  of  the  data. 

APPLICATIONS  TO  EQUIPMENT  DESIGN  PROBLEMS 

The  design  might  be  used  when  the  investigator  expects 
that  only  one  combination  of  equipment  parameters  in  a  2x2 
matrix  will  show  an  exceptionally  strong  effect.  This  form 
of  interaction  is  referred  to  as  "boom"  effect.  For 
example,  in  the  design  of  pilot-training  simulators,  one 
needs  to  know  how  many  and  which  of  the  six  degrees  of 
motion  are  necessary  for  effective  training.  Several 
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approaches  can  be  considered.  One  might  study  the  main 
effects  of  the  six  degrees  of  motion  in  a  screening  design. 
With  16  observations,  all  main  effects  can  be  isolated 
from  the  effects  of  strings  of  two-factor  interactions  but 
remain  aliased  with  three-factor  interactions.  For  this 
particular  problem,  however,  one  might  suspect  that  it 
would  be  more  informative  to  obtain  and  understand  the 
effects  of  combinations  of  degrees  of  motion,  i.e.,  the 
interactions,  rather  than  the  main  effects.  If  so  the 
conventional  screening  design  would  be  the  worst  data 
collection  plan  to  use.  On  the  other  hand,  it  would  not  be 
economical  to  do  a  complete  factorial  design  to  look  at 
all  the  interactions.  With  six  degrees  of  motion  being  pre¬ 
sent  or  absent,  this  would  require  the  investigation  of  64 
combinations.  A  less  systematic  approach  has  sometimes  been 
proposed  in  which  only  particular  combinations,  selected 
on  the  basis  of  special  knowledge,  might  be  studied  but  this 
is  always  subject  to  the  dangers  of  omission,  particularly 
at  the  early  phases  of  a  research  program.  As  an  alterna¬ 
tive  to  these  approaches,  DeGray's  oactive  designs  might 
be  employed  for  a  preliminary  economical  look  at  the  problem. 

CONSTRUCTING  A  COACTIVE  DESIGN* 

The  design  requires  a  minimum  of  N  combinations  to 
identify  the  effective  interactive  (or  coactive) 
combinations  of  N  variables.  These  N  combinations  consist 
of  N  arrangements  of  the  N  variables  taken  (N-l)  at  a  time. 
Starting  with  all  factors  present  (or  at  a  high  level) ,  the 
remaining  combinations  are  made  up  by  dropping  one  factor 
each  time  (or  by  setting  it  at  a  low  condition) .  DeGray 
felt  that  it  was  unnecessary  to  include  the  all-high  combin¬ 
ation;  however,  in  behavioral  studies,  both  all-high  and 
all-low  (or  absent)  combinations  might  be  included  to  provide 
a  frame  of  reference  against  which  the  other  combinations 
could  be  compared. 

Thus  a  design  for  examining  interactive  effects  among 
six-degrees  of  motion  designated  A  through  F,  would  look  like 
this; 

(1)  A  B  C  D  E 

(2)  A  B  C  D  F 

(3)  A  B  C  E  F 

(4)  A  B  D  E  F 

(5)  A  C  D  E  F 

(6)  B  C  D  E  F 


*DeGray  discusses  why  this  design  differs  from  the 
classical  one-f actor-at-a-time ,  main  effect  design. 
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where  the  presence  or  absence  of  a  letter  indicates  the 
presence  or  absence  of  the  corresponding  degree  of  motion. 

As  suggested  earlier,  we  might  also  wish  to  include  two 
other  combinations,  ABCDEF  and  (1) . 

INTERPRETATION  OF  THE  DATA 

The  data  would  not  be  combined  arithmetrically , 

Instead,  the  pattern  of  responses  would  be  examined.  For 
example,  if  all  of  the  combinations  gave  a  positive  response 
except  No.  3,  then  it  would  be  Variable  D  that  probably 
caused  the  reaction.  If  combinations  Numbers  4  and  6  both 
failed  to  give  an  adequate  response,  the  interaction  between 
A  and  C  would  be  suspected  of  causing  the  reaction.  If 
Numbers  3,  4,  and  6  failed  to  react,  then  interaction  ACD 
would  be  suspected.  On  the  other  hand,  if  combinations  of 
variables  acted  as  inhibitors,  the  pattern  of  responses 
would  be  interpreted  in  reverse.  The  investigator  should  be 
able  to  determine  which  of  the  opposing  situations  exists, 
either  by  using  knowledge  he  already  has  or  by  making  a  few 
additional  observations. 

DeGray  points  out  that  the  design  works  best  when  a 
positive  response  is  definitely  positive  and  a  negative 
response  is  definitely  negative.  This  may  not  be  typical  of 
behavioral  data,  and  certainly  not  with  the  clarity  that 
might  be  found  in  certain  chemical  reactions.  The  design 
still  can  be  used  if  an  investigator  suspects  certain  inter¬ 
actions  might  be  important  and  inspects  the  data  accordingly. 
If  used  cautiously,  this  might  be  the  most  economical  means 
of  unraveling  a  great  many  two-factor  interactions  effects 
confounded  in  strings  in  the  Resolution  IV  screening  design. 

Interpreting  the  results  from  this  design  can  be 
affected  if  sequence  effects  are  operating.  Simon  (1974)  has 
suggested  ways  that  these  might  be  minimized  during  the  data 
collection  phase .  Furthermore,  in  our  degrees  of  motion 
example,  no  consideration  was  given  to  the  fact  that  other 
equipment  variables  may  be  operating  and  could  interact  with 
the  other  degrees  of  motion.  These  might  be  held  constant 
in  an  exploratory  study  or  included  in  the  coactive  design 
if  the  investigator  deems  it  necessary. 

It  should  be  obvious  that  the  investigator  must  take 
the  same  precautions  with  this  as  with  any  other  experimental 
design,  whether  exploratory  or  primary.  The  technique  is 
worth  trying  and  gives  a  systematic  economical  approach  to 
questions  that  otherwise  could  be  costly  to  answer.* 


* 

Cotter  (1979),  without  referencing  DeGray's  work,  presents  an  expanded 
__  2n  +  2  —  design  composed  of  the  n  conditions  in  DeGray's  coactive  design, 
plus  n  "foldover"  conditions,  plus  one  condition  with  all  factors  at  the  high 
level  and  one  with  all  at  the  low  level.  Though  slightly  better  than 
DeGray's,  the  precision  of  this  design  is  still  poor. 
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GRAPHIC  METHOD  AND  INTERNAL  COMPARISON 
FOR  MULTIPLE  RESPONSE  DATA 

The  2^  P  screening  designs  described  by  Simon  (1973, 
1977a)  do  not  rely  on  replication  to  provide  the  error 
variance  needed  in  a  test  of  statistical  significance  of  the 
effects.  Daniel  (1959,  1976;  also  see  Simon,  1977a, 
pp  38-98)  proposed  the  use  of  order  statistics  and  internal 
comparison  procedures  to  detect  those  uniresponse  effects 
that  were  probably  not  due  to  cl  ,ce.  Internal  comparisons 
procedures  permit  simultaneous  c  loarisons  among  effects 
with  the  aid  of  a  statistical  st  .dard  which  is,  at  least  in 
part,  generated  internally  by  tin  data.  The  process  is  con¬ 
ceptually  simple.  If  the  univariate  effects  (or  contrasts) 
determined  from  the  analysis  of  the  fractional  factorial 
(screening  design)  are  due  to  chance,'  when  ordered  and 
plotted  properly  on  normal  probability  paper*  they  will  lie 
along  a  straight  line.  Those  that  are  larger  than  might  be 
expected  by  chance  will  deviate  noticeably  from  the  line. 

Wilk  and  Gnanadesikan  (1964;  also  see  Roy,  Gnanadesikan , 
and  Srivasta,  1971,  Chapter  VIII;  also  see  Simon,  1977a,  pp 
151-158  for  a  general  description  of  the  procedure)  describe 
how  to  use  a  similar  graphic  method  to  evaluate  the  effects 
from  an  unreplicated  design  when  there  are  multiple  respon¬ 
ses.  While  there  are  times' when  the  purpose  of  the  mission 
(and  not  the  statistics)  determines  the  relative  weights  to 
be  assigned  to  multiple  criteria,  there  will  be  circumstances 
when  the  investigator  may  wish  to  rely  on  simple  statistical 
combinations  that  take  into  consideration  the  correlated 
relationships.  Since  the  procedure  proposed  by  Wilk  and  ' 
Gnanadesikan  is  rather  involved  and  is  explained  in  matrix 
algebra  terms,  it  is  described  b‘’low  along  with  an  example  to 
facilitate  its  use. 

THE  GENERAL  APPROACH 

The  procedure  in  some  respects  is  analogous  to  Daniel's 
wherein  the  multivariate  responses  of  the  2^  p  effects  are 
tested  simultaneously  by  means  oi  graphical  internal  com¬ 
parisons.  In  this  approach,  we  will  need  a  multivariate 
estimate  of  the  effect  and  suitable  graph  paper  on  which  to 
plot  it.  The  multivariate  estimate  will  be  the  squared 
Euclidean  distance  between  the  centroids  (of  high  and  low 
conditions)  in  the  multivariate  space.  These  will  be 
plotted  on  the  appropriate  quantiles  of  the  standard  gamma 
distribution.  If  the  null  hypothesis  is  correct,  the  plots 


*  In  the  original  article,  Daniel  (1959)  proposed  using 
half-normal  probability  paper  on  which  to  plot  the  data. 
Later,  in  his  1976  book,  he  suggested  that  more  information 
would  be  revealed  if  full  normal  probability  paper  were  used. 
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will  lie  along  a  straight  line.  If  it  is  not  correct,  the 
largest  effects  will  appear  to  curve  away  from  the  straight 
line  conf iguration.  This  procedure  is  somewhat  more  compli¬ 
cated  to  apply  than  is  the  plotting  of  univariate  effects 
on  normal  probability  paper  since  two  parameters,  A  and  n, 
of  the  gamma  distribution  must  be  estimated. 

Let  us  illustrate  the  procedure  step  by  step,  using  a 
numerical  example  prepared  by  Weinman  (1979) . 

THE  WORKING  DATA 

The  fictitious  data  for  this  example  is  shown  in  the 
first  three  columns  of  Table  7.  In  the  first  column  are  the 
32  experimental  conditions  of  a  2^  factorial  design  listed 
in  Yates'  standard  order.  In  the  next  two  columns  are  the 
values  for  the  two  responses,  y ^  and  y2,  for  each  condition. 
The  second  variable,  y2  was  actually  created  by  adding 
random  digits,  (+0,  +l,...+9),  to  yl. 

OBTAINING  THE  SQUARED  DISTANCES 

The  next  two  columns,  and  Y  2,  give  the  effects  (mean 
differences  areontrasts)  for  the  values  in  y1  and  y2,  respec¬ 
tively.  The  effects  are  obtained  by  applying  Yates'  algo¬ 
rithm  to  the  columns  for  y!  and  y2  separately.  Since  the 
effects  are  arranged  in  standard  order,  the  sources  of  their 
identities  are  composed  of  the  same  letters  as  the  condi¬ 
tions  on  the  same  line. 

To  calculate  t^e  squared  distances  for  the  Identity 
Matrix,  sum  (Y*  +  Y2).  For  example,  using  the  Yi  and  Y2 
values  for  the  S  effect  in  Table  7,  i.e.,  -4  and  -5,  the 
squared  distance  for  S  is  (-4) 2  +  (-5)^  =  41.  These  are 
listed  in  Table  7  in  column  six,  and  the  ource  of  each  is 
the  same  as  for  the  individual  effects.  If  there  had  been 
more  than  two  responses,  the  squared  distances  for  £he 
Identity  column  would  have  been  (Yj  +  Y2  +  ....  +Y  ).  This 
calculation  assumes  that  the  relationship  among  the15 
responses  is  a  simple  linear  one,  that  is,  they  are  weighted 
equally  without  standardization. 

DETERMINING  THE  PARAMETERS  OF  THE  GAMMA  PLOT 

First  determine  how  many  of  the  K  squared  distances 
are  likely  to  be  trivial.  Ordinarily  the  higher-order 
interactions  would  be  selected.  In  our  example,  all  three- 
factor  interactions  and  higher  --16  distances  (Table  7, 
column  1)  -  were  chosen.  Of  these,  the  M  smallest  squared 
distances  are  selected  so  that  the  ratio  K/M  is  at  least 
3/2  =  1.5,  and  preferably  slightly  larger.  Avoid  any 
potentially  critical  distances.  In  our  example,  the  eight 
distances  of  the  Identity  matrix  (Table  7,  column  6)  with 
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TABLE  7.  RAW  DATA,  EFFECTS,  AND  SQUARED  DISTANCE 


Experimental 

Conditions 

i 

Responses 

Ef 

,  ft 

zects 

Squared 

Distances 

—u_ 

-X2~ 

3 

—  1 

Identi ty 

b 

S'1 

7 

(1) 

s 

3.5 
11 .  5 

7.0 

-15.5 

-0  . Ot . 

-4.00  J 

0.094 

-5.000 

41.0 

2 .34 

d 

8.0 

10.0 

7.87' 

8  .  312 

131 . 1 

8 . 36 

sd 

-2.0 

-2.0 

2.5. 

2.875 

14 . 5 

0.82 

n 

4  .  5 

9.0 

1 . 5 1, 2 

1.938 

6 . 2 

0.35 

sn 

-18 . 0 

-19.0 

1.68c 

2 . 250 

7.9 

0 .50 

dn 

4.0 

6.5 

2.562 

2.688 

13.8 

0.89 

sdn* 

5.5 

3 . 5 

0.938 

1.000 

1.9* 

0.12* 

P 

-1.0 

0.5 

-2.625 

-3.375 

21.9 

1 .  74 

sp 

-14.5 

-18.0 

1  .  500 

2 . 683 

9  .  5 

1.27 

dp 

9.0 

5.5 

-0.250 

0.125 

0.1 

0 .17 

sdp* 

-4  .  5 

-9.0 

-5 . 875 

-7.062 

84.4 

3.45 

np 

-11.5 

-15.5 

-2.562 

-1.875 

10.1 

1.82 

snp* 

-6.5 

-6.5 

0.562 

1.433 

2.4* 

0.67* 

dnp* 

1.0 

3.0 

0.562 

3.253 

10.9 

6.68 

sdnp* 

3.0 

3.5 

-0.312 

-1.433 

2.2* 

1.-6 

k 

2.5 

0.5 

3.7  50 

4.612 

37.2 

2  . 19 

sk 

-10.0 

-9.0 

4.250 

t .  1 2  5 

55.6 

4 . 14 

dk 

-4.0 

0 

-2 .000 

-1.562 

6 . 4 

0.97 

sdk* 

-7.0 

10.0 

-0.750 

-1.000 

1.6* 

0 . 10* 

nk 

6.0 

10.5 

2.138 

1.188 

6 . 2 

2 . 18 

snk* 

0.5 

0 

-3.062 

-3  .  50(1 

21.6 

1.23 

dnk* 

7  .  5 

3.5 

1  .168 

fi  -  4  3  3 

1.6* 

0 . 96 

sdnk* 

16 . 5 

17.5 

-1.094 

0  .  7  50 

1.7* 

3.94 

pk 

-4.5 

-4.5 

-0.250 

0  .  o  2  5 

0 . 4 

0 .80 

spk* 

7.5 

9.5 

-1.75-' 

-2.562 

9.6 

0.75* 

dpk  * 

2.5 

3.0 

-0.8 

-0.625 

1.2* 

0.22* 

sdpk* 

-1.5 

-2.0 

-1.87  - 

-2.062 

7.8 

0 .46* 

npk* 

-5.5 

-0.5 

-2 .4  3-: 

-2.500 

1  ^  2 

0.83* 

snpk* 

-5.5 

-5.5 

-3.136 

-3.438 

22.0 

1.35 

dnpk* 

9.0 

11.0 

1  .435 

1.875 

5.6* 

0.34* 

sdnpk* 

1.0 

4.0 

2.43b 

1.938 

9 . 7 

1.39 

tThe  sources  for  the  effects  (and  the  squared  distances)  are 
labelled  with  capital  letters  that  correspond  to  the  letters  of 
the  Experimental  Condition  on  the  same  row.  For  example.  Factor  S 
has  effect  -4  for  Yj  and  -5  for  Y2. 
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an  asterisk  beside  them  were  chosen*.  This  made  the  K/M 
ratio  equal  16/8  =  2,  an  acceptable  value. 

The  M  values  just  selected  will  be  used  to  obtain  an 
estimate  of  n,  the  shape  of  the  gamma  distribution.  Two 
parameters,  P  and  S,  are  needed. 

P  =  geometric  mean  of  the  M  distances  divided 
by  the  largest  of  the  M  values.  The  geo¬ 
metric  mean  is  equal  to  the  product  of  the 
M  distances  raised  to  1/M, 

S  =  arithmetric  mean  of  the  M  distances  divided 
by  the  largest  of  the  M  values.  The  erith- 
metric  mean  equals  the  sum  of  the  M  distances 
divided  by  M. 

In  our  example,  P  =  2.034/5.6  =  0.36,  and  S  =  2.275/5.6 
=  0.41.  To  find  Eta  =  n ,  use  Table  IV,  beginning  on  page 
198,  in  Roy,  Gnanadesikan,  and  Srivastava's  book  (1971).** 
Find  the  table  for  appropriate  K/M  ratio  and  then  look  up 
P  and  S.  It  may  be  necessary  to  make  a  bilinear  interpola¬ 
tion  to  properly  represent  your  values  if  they  are  located 
between  those  listed  in  the  tables. 

A  bilinear  interpolation  can  be  made  using  the  following 
equation: 

H  =  mi  (1  -  a  -  b  +  ab)  +  n  2 1  (a  -  ab)  +  nu  (b  -  ab) 

+  n  2  2  (ab) 
where 


_  _  P  -  PI  ^  _  S  -  SI 

a  P2  -  PI  and  b  S2  -  SI 

and  P  and  S  are  the  values  you  obtained,  and  PI,  P2,  SI,  and 
S2  are  the  two  pairs  of  values  in  the  tables  that  bracket  P 

and  S.  n .  •  is  the  tabled  value  at  P.  and  S.. 

J-l  11 

In  our  example,  for  K/M  =  2.0,  we  find  that  P  =  .36  is 
one  of  the  values  listed;  we  are  lucky.  Our  S  of  0.406, 
however,  is  between  the  tabled  values  of  .40  and  .44,  where 


*In  retrospect,  one  might  question  whether  or  not  dis¬ 
tance  Xg  =  5.6  should  have  been  included  in  the  M  set,  since 
it  appears  to  be  unduly  large. 

**These  tables  are  also  given  in  an  article  by  Wilk, 
Gnanadesikan,  and  Huyett  (1962). 
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the  ns  are  1.443  and  1.306,  respectively.  The  calculations 
for  making  the  bilinear  interpolations  are  shown  in  Table  8 
for  this  data  (although  in  this  case,  no  interpolation  was 
needed  for  the  P^  dimension.  The  rj  is  estimated  to  be  1,4. 

To  find  the  quantiles  of  the  gamma  distribution,  we  need 
Table  VII  beginning  on  page  208  in  Roy,  Gnanasedikan,  and 
Srivastava's  book  (1971)*.  The  percentages  and  associated 
quantile  values  will  be  found  in  that  table  under  the  ETA 
value  just  determined.  In  our  example,  under  the  ETA  -  1.4, 
we  find  the  following  values  listed; 


Percentage 


Quantile 


0.1 

0.5 

0.7 

on  up  to 

99.5 

99.9 


8 . 4321789E-03 
2 . 6824095E-02 
3 . 5971068E-02 

6 . 2058056E  00 
7 . 9005137E  00 


With  these  pairs  of  values  serving  as  coordinates,  we  plot 
the  function  relating  percentage  to  quantiles  (see  Figure  4) . 


♦These  tables  can  also  be  found  in  an  article  by  Wilk, 
Gnanadesikan ,  and  Huyett  (1962). 


n 

A 

n 

n 


n 


TABLE  8.  EXAMPLE  OF  BILINEAR  INTERPOLATION 


PI  =  .36  =  P2 


n  1 1 

n  2 1 

S1  ( 

.40) 

1.443 

1.443 

n  1 2 

2  2 

S2  ( 

.44) 

1.306 

1.306 

•  36  - 

.36 

fi 

b  -  *406 

-  .40 

.15 

a  .36  - 

.36 

b  .44 

-  .40 

n i i  (1-a-b+ab)  = 

n  2 i  (a-ab) 

+  ni2  (b-ab) 

+  n  2  2 

(ab) 

1.443  (1-0- 

.15+0)  +  1.443  (0 

-0)  +  1.306 

(.15-0) 

+1.306 

1.443  (.85) 

+  1. 

306  (.15) 

1.42  or  1.4 


i 


(0) 
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THE  PLOT 

Table  9  is  to  be  prepared.  The  first  column  consists 
of  the  ranks  listed  from  1  to  N-l,  in  our  case,  31.  Next, 
the  percentage,  b.,  associated  with  each  rank  is  calculated 
using  the  following  equation: 


b.  =  x-~  °-5  x  100 
1  L 


for  rank  i  =  1,  2,  ...  L,  where  L  =  N-l,  or  31  in  our  ex¬ 
ample.  Later,  if  the  investigator  wishes  to  drop  some  of 
the  largest  distances,  he  would  go  through  this  same 
procedure,  but  would  use  the  smaller  value  of  L.  These 
percentage  values  would  be  listed  beside  the  appropriate 
rank  as  shown  in  Table  9,  column  ’.  Next  we  will  list  in 
column  3,  the  quantiles  associates  with  each  of  the  per¬ 
cents.  These  are  determined  from  the  function  drawn  in 
Figure  4.  For  example,  b.fi  or  c  ,  equals  quantile  x. ,  or 
1.08.  In  column  4,  we  list  tho  „  ared  distances  of  tne 
Identity  matrix  found  in  Table  7,  >ut  ordered  from  the 
smallest  to  the  largest.  In  column  5,  the  source  associated 
with  each  distance  is  given. 

L  points  are  next  plotted  on  ordinary  graph  paper  with 
the  squared  distances  on  the  ordinate  and  the  quantiles  on 
the  abscissa.  The  31  points  in  our  example  are  shown 
plotted  in  Figure  5.  Although  reasonable  care  should  be 
taken  in  the  plotting,  for  the  smaller  ranks,  it  may  not  be 
necessary  to  plot  every  point. 

INTERPRETATION 


Inspection  of  Figure  5  shows  that  the  distances  at  the 
lower  ranks  tend  to  lie  along  a  straight  line*.  At  the 
higher  ranks,  certain  distances  b<^gin  to  deviate  above  the 


*  Actually,  a  break  is  visible  between  the  9th  and  10th 
point,  creating  two  straight  lines  with  approximately  the 
same  slope.  This  suggests  that  the  error  variance  is  not 
homogeneous.  Weinman  (1979)  proposes  a  possible  explanation 
for  this  based  on  the  fact  that  the  x,  data  had  been  taken 
from  some  published  by  Yates.  He  writes:  "Yates  mentions 
a  ridge  of  fertility  running  through  the  field.  This  could 
account  for  such  a  jump.  It's  likely  the  hypothesis  of 
homogeneous  variances  is  not  correct.  I  would  also  note  that 
the  field  was  laid  out  in  four  blocks  and  none  of  the  14 
longest  distances  come  from  block  three.  This  suggests  that 
block  three  has  a  different  effect  on  yields  from  the  other 
three  blocks,  possibly  because  of  the  ridge  of  fertility." 
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TABLE  9.  DATA  REQUIRED  TO  PLOT  THE  ORDERED 
DISTANCES  FOR  THE  IDENTITY  MATRIX 


Ordered 


Ranks 

Percentages 

Quantiles 

Distances 

Sources 

1 

1.6 

.06 

0.1 

DP 

2 

4.8 

.14 

0.4 

PK 

3 

8.1 

.21 

1.2 

DPK 

4 

11.3 

.27 

1.6 

DNK 

5 

14.5 

.33 

1.6 

SDK 

6 

17.7 

.39 

1.7 

SDNK 

7 

21.0 

.46 

1.9 

SDN 

8 

24.2 

.52 

2.2 

SDNP 

9 

27.4 

.58 

2.4 

SNP 

10 

30.6 

.64 

5.6 

DNPK 

11 

33.9 

.71 

6.2 

N 

12 

37.1 

.78 

6.2 

NK 

13 

40.3 

.85 

6.4 

DK 

14 

43.5 

.92 

7.8 

SDPK 

15 

46.8 

1.00 

7.9 

SN 

16 

50.0 

1.08 

9.5 

SP 

17 

53.2 

1.16 

9.6 

SPK 

18 

56.5 

1.25 

9.7 

SDNPK 

19 

59.7 

1.34 

10.1 

NP 

20 

62.9 

1.45 

10.9 

DNP 

21 

66.1 

1.55 

12.2 

NPK 

22 

69.4 

1.68 

13.8 

DN 

23 

72.6 

1.81 

14.5 

SD 

24 

75.8 

1.95 

21.6 

SNK 

25 

79.0 

2.12 

21.9 

P 

26 

82.3 

2.32 

22.0 

SNPK 

27 

85.5 

2.55 

37.2 

K 

28 

88.7 

2.83 

41.0 

S 

29 

91.9 

3.20 

55.6 

SK 

30 

95.2 

3.75 

84.4 

SDP 

31 

98.4 

4.82 

131.1 

D 
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Figure  5.  Plots  of  Ordered  Distances  Against 
Quantiles  for  the  Identity  Matrix. 
(L  =  31,  K  =  16,  M  =  8,  n  =  1.4) 
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line  extrapolated  from  the  line  formed  by  the  points  at  the 
left.  The  interpretation  process  corresponds  to  that  for 
uniresponse  graphic  plots,  i.e.,  those  distances  lying  off 
the  line  are  larger  than  might  be  expected  by  chance.  In 
this  example,  distances  for  D,  SDP,  SK,  S,  and  K,  are  well 
above  the  line,  and  SNPK,  P,  and  SNK  are  worthy  of  further 
investigation.  One  might  repeat  this  entire  process  but 
dropping  the  first  four  distances,  reducing  K  to  27.  This 
procedure  might  be  repeated  several  times  for  the  remaining 
larger  distances. 

The  graphic  process  provides  one  additional  criterion 
to  help  the  investigator  interpret  his  data.  The  larger  the 
number  of  effects,  the  more  likely  the  idealistic  principles 
will  behave  properly  (providing  the  experimenter  has  done 
the  rest  of  his  job  properly) .  Certainly  the  multivariate 
nature  of  these  effects  makes  any  interpretation  more  com¬ 
plex  than  would  be  the  case  with  a  uniresponse  design.  This 
complexity  is  further  increased  by  the  aliasing  present  in 
the  fractional  factorial  designs.  Only  experience  will 
overcome  these  difficulties.  For  the  present,  however,  the 
investigator  should  check  and  double  check  his  conclusions 
against  a  variety  of  criteria,  using  the  analyses  to  assist 
him  rather  than  to  lead  him.  Gnanadesikan  (1963)  emphasizes 
that  the  use  of  a  multiresponse  analysis  does  not  preclude 
examining  each  response  singly  to  better  understand  the 
multiresponse  data. 

OTHER  MULTIVARIATE  MODELS 

Most  statistical  analysis  is  based  on  a  linear  ordering 
of  the  data.  This  is  a  characteristic  of  real  numbers  but 
not  of  vectors  or  combinations  of  real  numbers.  Thus,  the 
ordering  of  multivariate  data  involves  the  use  of  a  metric 
or  some  measure  of  size*.  Distance,  d.,  does  not  have  to  be 
measured  as  "squared  distance"  as  was  done  in  the  above 
example.  Actually  the  complete  expression  should  be: 

• 

df  =  x^  Ax^»  i  r  1,  2 ,  ....,  K 

where  A  is  a  compounding  matrix,  positive  semi-definite, 
but  otherwise  selected  at  the  discretion  of  the  investigator. 

The  Identity  matrix  used  in  our  illustration  weights  the 
variables  Y.  and  Y_  equally  and  does  not  take  into  account 
any  correlation  that  might  occur  between  the  two.  For  that 


*  Barnett  (1976)  has  written  a  comprehensive  article 
describing  the  problems  and  proposed  solutions  for  the 
ordering  multivariate  data. 
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model,  the  compounding  matrix  A  is  a  unit  matrix,  which  has 
ones  on  the  principal  diagonal  and  zeros  off  the  diagonal. 

A  slightly  more  complex  matrix  would  be  a  diagonal 
matrix  with  reciprocals  of  the  variance  of  Y.  and  Y»  on  the 
diagonal.  This  is  equivalent  to  defining  "distance"  as 


2  2 


The  variables  would  be  "adjusted  for  size,"  but  would  still  be 
treated  separately  with  their  effects  simply  added  together. 

THE  S-1  MATRIX 


To  take  into  account  the  covariation  among  responses,  a 
non-diagonal  matrix  is  required.  A  standard  matrix  used  for 
this  purpose  in  multivariate  analysis  is  S~l,  where  S  is  a 
matrix  of  estimates  of  variances  and  covariances  of  the 
variables.  That  is. 


2 

where  the  diagonal  elements,  S.,  are  the  sample  variances  of 
the  variables  and  the  off-diag&nal  elements,  S..,  are  sample 
covariances  between  variables  i  and  j .  That  i£? 


c2  _  N  I  Y2  -  (EY.)2 

~  - i _ - _ 

1  (N-l) 

and 

EY.Y.  -  <EY.)  (Ey..) 

s  =  _ N 

ij  (N-l) 


In  general,  one  would  use  a  computer  to  both  calculate  the  S 
matrix  and  to  invert  it  (i.e.,  find  S“*) .  Computer  packages 
for  these  purposes  are  readily  available. 
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In  the  original  example,  with  only  two  responses,  the 
S  matrix  is  only  2X2  and  inversion  is  simple.  If 


then 


The  distances.  D,  are  defined  as 

D  -  (Y,  V2>  S-'ty) 
or  in  our  example. 
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The  D's  calculated  in  this  manner  are  shown  in  the  last 
column  of  Table  7. 


The  procedure  from  this  point  on,  given  these  distances 
based  on  the  S-^-  matrix  rather  than  the  Identity  matrix  would 
be  the  same.  When  L  =  31,  K  =  16,  and  M  =  8,  the  eight 
smallest  distances  (among  the  3-factor  or  higher  interactions 
are  marked  with  asterisks  in  the  last  column) .  Note  that 
five  of  the  smallest  are  the  same  as  with  the  Identity 
matrix,  but  three  are  not.  For  those  readers  who  wish  to 
work  through  the  problem  themselves,  Weinman  provides  the 
following  values  for  comparison: 


P  =  .41,  S  =  .53,  n  =  1.41* 

The  quantiles  and  the  ordered  distances  when  the  S-^  matrix 
is  used  are  shown  in  Table  10.  These  should  be  plotted  as 
before . 


When  we  compare  the  results  obtained  when  the  Identity 
and  the  S-^  matrices  are  used,  we  find  that  D  is  the  largest 
effect  in  both  and  that  SK  and  SDP  are  large  in  both.  But 
DNP  and  SDNK which  are  not  large  when  the  Identity  matrix  is 
used  (i.e.,  when  "distance"  is  defined  as  the  sum  of  the 
performance  measures  squared)  are  large  when  the  S“1  matrix 
(i.e.,  the  sum  of  the  reciprocal  of  the  variances  of  each 
performance  measure  with  their  covariation  taken  into 
account)  is  used.  Weinman  (1979)  explains  this  as  follows: 
DNP  is  "found"  by  the  S-1  matrix  because  the  value  of  Y_  at 
DNP  is  much  larger  than  that  of  Y^ .  A  similar  statement  is 
true  at  SDNK,  where  the  sign  of  Y^  is  negative  while  the 
sign  of  is  positive."** 

OTHER  COMPOUNDING  MATRICES 

Roy,  Gnanadesikan,  Srivastava  (1971,  p  103)  state, 
regarding  the  choice  of  the  compounding  matrix.  A:  "A 
truly  multivariate  situation  cannot  usually  be  fully  des¬ 
cribed  by  a  single  one-dimensional  representation  and. 


*It  is  happenstance  that  the  values  of  n  are  the  same 
for  the  Identity  and  S  matrices  in  this  example. 

**The  results  from  this  analysis  cannot  be  related  to 
anything  in  the  real  world  since  the  data  for  Y_  were 
contrived. 
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TABLE  10.  DATA  REQUIRED 

TO  PLOT  THE  ORDERED 

DISTANCES  FOR 

THE  S  -1  MATRIX 

Ordered 

Ranks 

Quantiles 

Distances 

Sources 

1 

.06 

0.10 

SDK 

2 

.14 

0.12 

SDN 

3 

.21 

0.17 

DP 

4 

.27 

0.22 

SPK 

5 

.33 

0.34 

DNPK 

6 

.39 

0.35 

N 

7 

.46 

0.46 

SDPK 

8 

.52 

0.50 

SN 

9 

.58 

0.67 

SNP 

10 

.64 

0.75 

SPK 

11 

.71 

0.80 

PK 

12 

.78 

0.82 

SD 

13 

.85 

0.83 

NPK 

14 

.92 

0.89 

DN 

15 

1.00 

0.96 

DNK 

16 

1.08 

0.97 

DK 

17 

1.16 

1.16 

SDNP 

18 

1.25 

1.23 

SNK 

19 

1.34 

1.27 

SP 

20 

1.45 

1.35 

SNPK 

21 

1.55 

1.39 

SDNPK 

22 

1.68 

1.74 

P 

23 

1.81 

1.82 

NP 

24 

1.95 

2.18 

NK 

25 

2.12 

2.19 

K 

26 

2.32 

2.34 

S 

27 

2.55 

3.45 

SDP 

28 

2.83 

3.94 

SDNK 

29 

3.20 

4.14 

SK 

30 

3.75 

6.68 

DNP 

31 

4.82 

8.36 

D 
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therefore,  it  should  be  recognized  that,  for  any  given  prob 
lem,  it  is  always  advisable  to  try  several  different  com¬ 
pounding  matrices  in  the  calculation  of  the  squared 
distances  {d.}.  In  fact,  it  would  often  be  desirable  not 
only  to  try  out  several  measures  of  size  or  squared  dis¬ 
tances  which  are  positive  semi-definite  quadratic  forms, 
but  also  to  try  different  kinds  of  measures  of  size  or 
distance  functions.  The  problem  of  the  choice  of  A  is 
under  continuing  study  and  a  preliminary  report  of  certain 
findings  of  the  study  may  be  found  in  Wilk  et  al  (1962." 
They  point  out  that  any  interpretation  of  the  results  must 
be  made  conditional  upon  the  choice  of  the  A  matrix. 


NAVTRAEQUIPCEN  78-C-0060-3 
SECTION  VIII 

THE  PLACE  FOR  REPLICATION  IN  ECONOMICAL 
MULTI FACTOR  RESEARCH 


To  psychologists,  replicating  an  experimental  design  is 
as  natural  as  breathing  and  occurs  just  as  unconsciously. 
Unfortunately,  it  is  a  costly  procedure  when  economy  in 
multifactor  research  is  paramount;  fortunately,  it  is  not 
always  necessary.  It  is  important  that  the  investigator 
recognize  the  different  situations  in  which  replication  may 
or  may  not  be  desirable  or  required. 

REPLICATING  REQUIRED  LESS  IN  THE  EARLY  r-HASES  OF  RESEARCH 

Before  an  unbiased  model  of  a  response  surface  has  been 
derived,  replicating  a  design  is  generally  not  cost  effec¬ 
tive.  During  the  screening  phase,  it  is  more  productive  to 
use  any  extra  data-col lection  effort  to  add  new  points  to 
the  fractional  factorial  than  it  is  to  repeat  original 
conditions.  For  example,  if  one  were  to  replicate  the  con¬ 
ditions  of  a  Resolution  III  design,  the  only  additional 
information  that  would  be  obtained  would  be  an  estimate  of 
the  error  variance.  If  instead  of  replicating,  however, 
data  were  collected  to  complete  a  second  properly  selected 
Resolution  III  design,  although  no  error  would  be  externally 
estimated,  main  and  two-factor  interaction  effects  could  be 
isolated,  a  considerable  increase  in  information.  Repli¬ 
cating  to  obtain  an  error  estimate  is  not  justified  in  the 
screening  phase  since  other  techniques  can  be  initially 
employed  to  get  a  rough  estimate  of  essentially  the  same 
information.  For  example,  an  approximation  of  error  might 
be  obtained  from  the  left  over  sources  of  variance  when  main 
effects  do  not  saturate  the  design;  when  they  do,  order 
statistics  can  provide  an  internal  error  estimate.  If 
trivial  effects  are  present,  they  can  provide  a  source  of 
"discovered"  replication.  Precision  in  multifactor  screening 
design  is  derived  from  "hidden"  replication.  These  terms 
and  methods  have  been  discussed  in  other  reports  by  Simon 
(1973,  1979). 

REPLICATION  USEFUL  TO  ESTABLISH  PSYCHOLOGICAL  CONFIDENCE 

Even  when  an  investigator  conscientiously  tries  to 
include  all  factors  in  his  experiment,  he  may  not  be  success¬ 
ful  because  of  cost  and  time  pressures,  or  because  he  has 
not  yet  identified  the  source  of  variance.  Critical  subject 
characteristics,  which  in  the  holistic  approach  to  behavioral 
research  is  just  as  important  as  equipment  or  environmental 
factors,  are  often  difficult  to  identify.  While  an  inves¬ 
tigator  may  believe  that  he  has  considered  all  the  critical 
subject  factors,  he  may  wish  to  test  this  assumption  by 
running  several  subjects  on  the  same  conditions.  While  this 
is  often  referred  to  as  "replication,"  it  is  so  only  to  the 
extent  that  subject  characteristics  have  indeed  included  all 
major  sources  of  subject  variance. 
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The  prudent  investigator,  however,  will  test  this 
assumption.  He  will  repeat  all  or  parts  of  the  experiment 
using  two  or  more  individuals  that  are  presumably 
"identical,"  being,  in  fact,  identical  on  the  potentially 
critical  factors  already  identified.  By  taking  this  pre¬ 
caution,  the  investigator  will  compare  —  not  average  —  the 
results  from  several  subjects  to  see  if  the  same  critical 
effects  are  revealed  in  essentially  the  same  order  of 
magnitude  among  individuals.  If  two  or  more  presumably 
identical  subjects  perform  differently,  it  warns  the  investi¬ 
gator  of  the  possibility  that  other  unidentified  factors 
are  operating.  Simon  (1977a)  discusses  various  patterns  of 
responses  that  might  occur  and  possible  explanations  for 
them.  If  the  results  prove  to  be  essentially  identical,  then 
and  only  then  should  they  be  averaged. 

PARTIAL  REPLICATION  FOR  ERROR  ESTIMATES 

When  the  response  surface  is  being  derived,  it  becomes 
important  to  estimate  how  well  the  regression  model  fits  the 
real  data.  The  lack  of  fit  variance  is  compared  in  an  F- 
test  with  some  external  estimate  of  error  variability.  Box 
and  Hunter  (1958)  propose  that  it  would  be  economical  and 
often  sufficient  to  obtain  this  estimate  by  only  replicating 
at  the  center  of  the  central-composite  design. 

Replication  at  the  center  of  a  central-composite  design 
may  not  provide  a  powerful  enough  test  of  significance 
(Simon,  1976a,  pp  16-18)  in  all  cases.  Furthermore,  if  it 
is  feared  that  variability  may  increase  away  from  the  center 
of  the  design,  the  estimate  at  the  center  may  be  too  small 
to  correctly  assess  the  coefficients  of  the  second-order 
polynomial.  Since  we  do  not  reall\  know  whether  the 
variance  is  or  is  not  homogeneous  over  the  total  surface,  we 
may  wish  to  replicate  at  points  other  than  the  center.  It 
is  not  necessary,  however,  to  replicate  the  entire  design. 
Partial  replication  can  increase  the  precision  of  our 
estimated  effects,  add  more  degrees  of  freedom  to  the  error 
estimate,  and  improve  the  evaluation  of  the  regression 
model . 

A  number  of  people  have  pr'  jjosed  partially  duplicated 
designs.  Daniel  (1978)  discusses  designs  proposed  by 
Clatsworthy  (1973)  for  the  partial  replication  of  two-way 
layouts.  Patel  (1963)  describes  the  partial  duplication  of 
two-level  fractional  factorial  designs.  Dykstra  (1960) 
proposes  several  plans  for  reproducing  certain  experimental 
conditions  of  the  central-composite,  second-order  response 
surface  designs.  As  a  general  principle  with  central- 
composite  designs,  replicating  either  the  star  or  the  cube 
portion  (or  a  fraction  thereof)  is  sufficient  for  partial 
replication.  Replicating  the  star  increases  the  precision 
of  estimates  away  from  the  center  of  the  experimental  space. 
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When  a  sub-fraction  of  a  Resolution  V  fractional  facto¬ 
rial  is  used  for  partial  replication,  Box  (1966)  provides  a 
simple  method  of  combining  the  old  and  new  data.  The 


Resolution  V  (or  V-)  design  indicates  that  with  the  original 
block  of  data,  all  critical  main  and  two-factor  interaction 
effects  have  been  isolated  from  one  another.  In  the  sub¬ 
fraction,  however,  this  is  not  the  case. 

New  estimates  of  each  effect  can  be  made  by  using  the 
following  equation  to  combine  the  old  and  new  data: 


(Old  estimate 
from  first 
block  . 


nl  +  n2P 


Effect 

of 

String 


Effects 
of  iso¬ 
lated 
sources 
combined 


Where  a  =  plus  or  minus  one,  which  corresponds  to  the 
sign  of  the  aliased  effect 

n^  =  number  of  observations  in  first  set  of  data 

n2  =  number  of  observations  in  second  set  of  data 

p  =  number  of  aliased  effects  in  string 

8—  2 

Let  us  use  this  to  combine  the  data  from  a  2  design 
with  data  from  a  28-4  design.  Given: 

First  set:  n  =64  conditions 
1 

Isolated  effects,  /A/  =  5,  /BC/  =  2,  /GH/  =  1 
Second  set:  n2  =  16  conditions 

Effect  of  string,  (A  -  BC  -  GH)  =  2.5 
p  =  three  effects  in  string 


i 

i 
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Substituting  in  the  equation  we  get: 


A  =  /A/  +  64^36-1-3  1(2.5  -  /A/  -  /BC/  -  /GH/] 


A  =  5  + 


16 

112 


[2.5  -  (5  -  2  -  1)] 


and 


A  =  5  +  j  [-0.5]  =  5  +  (-.0714)  =  4.93 


“  -  2  +  tfi  TTTT 


BC  =  2  +  (-i)  (-.5)  =  2.07 


and 


GH  =  1.07 


The  error  variance  of  the  new  estimate  (Var„)  is: 

N 


Var.,  4a' 
N  = 


n. 


[ 


n  +  n0  (p  -  1) 


‘1  nl  n2^ 

which  for  this  example  would  be: 

.2 


Var  = 
N 


4a 


r  64  +16  (3  -  1)  . 
64  1  64  +  16  x  3  J 


Var  =  f-E§_)  = 

V  N  16  112 


6  n2 
1  0 


Thus  the  additional  observations  reduced  the  error  variance 
of  the  new  estimate  to  approximately  86  percent  of  the 
original  variance. 

REPLICATION  TO  ESTABLISH  CONFIDENCE  LIMITS 

At  the  end  of  a  research  program,  after  presumably  two 
or  three  equipment  configurations  have  been  selected  based  on 
the  results  of  the  multifactor  experiments,  the  investigator 
may  wish  to  make  a  more  careful  comparison  of  these  final 
choices,  or  to  evaluate  them  against  some  earlier  version  to 
decide  whether  or  not  the  replacement  is  worthwhile.  To  make 
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a  more  precise  estimate  of  the  differences  in  means,  parti¬ 
cularly  if  there  is  an  interest  in  drawing  the  conclusion  of 
"no  (practical)  difference,"  the  investigator  will  wish  to 
replicate  these  few  experimental  conditions.* 

In  judging  whether  the  differences  are  important  or  not, 
the  investigator  must  trade-off  the  costs  of  making  a  wrong 
judgment  (Type  I  and  II  errors)  against  the  costs  of  more 
data  collection.  Rather  than  do  the  conventional  test  of 
statistical  significance,  the  investigator  may  wish  to 
establish  a  confidence  interval,  the  limits  between  which  a 
hypothesis  can  be  considered  tenable  and  outside  which, 
untenable  for  a  certain  probability  value.  It  is  interesting 
to  note  that  Cochran  and  Cox  (1957)  wrote  in  thteir  book  on 
experimental  designs  for  hypothesis  testing:  "On  the  whole, 
however,  tests  of  significance  are  less  frequently  useful  in 
experimental  work  than  confidence  limits"  (p  5) .  Box,  Hunter 
and  Hunter  (1978)  in  their  book  on  Statistics  for  Experimen¬ 
ters  say:  "Significance  testing  in  general  has  been  a 
greatly  overrated  procedure,  and  in  many  cases  where 
significant  statements  have  been  made  it  would  have  been 
better  to  provide  an  interval  within  which  the  value  of  the 
parameter  would  be  expected  to  lie"  (p  109) . 

Some  equations  for  estimating  confidence  limits  are 
shown  in  Table  11.  Note  that  in  the  subscript  for  t,  in 
addition  to  "d.f."  (degrees  of  freedom),  we  also  select  the 
t  for  a  particular  a(probability  of  making  a  Type  I  error) . 

In  some  textbooks,  this  a  is  replaced  by  a/2.  For  example, 
in  the  text  by  Cochran  and  Cox  (1966) ,  a is  used.  In  the 
text  by  Box,  Hunter,  and  Hunter  (1978)  a/2  is  used.  How 

can  these  apparently  different  equations  be  reconciled?  The 
difference  lies  in  the  t-table  one  intends  to  use.  In 
Cochran  and  Cox,  the  t-table  in  the  back  of  the  book  shows 
the  t-values  for  a  two-tailed  test  and  the  probability  value 
used  to  enter  the  table  is  the  proportion  of  both  ends  of 
the  distribution  summed.  In  Box,  Hunter  and  Hunter,  the 
t-table  in  the  back  of  the  book  (p  631)  shows  the  t-values 
for  a  one-tailed  test  and  the  probability  value  used  to  enter 


*We  didn't  replicate  much  or  at  all  when  we  had  a  great 
many  conditions  to  investigate.  As  we  reduce  the  number  of 
conditions  being  examined,  we  begin  to  replicate  more  and 
more . 
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TABLE  11.  EQUATIONS  FOR  ESTIMATING  CONFIDENCE  LIMITS 


1  -  g LIMIT  FOR  MEAN 
(Two-tailed  test) 


Mean  +  t,  .  ,a//n 

-  (n-1 ,  a  ) 


where 


a  -  A, 


E(y  -  y) 2/n  -  1 


1  -  a  LIMIT  FOR  MEAN  DIFFERENCE  (Paire’d) 


Mean  Diff.  +  t,.  -  .  (S.E.  Mean  diff.) 

-  (d.  f .  ,  a) 


(d  —  cT)  4 

where  S.E.  Mean  diff.  =  ^  ^n_-^  -- 


CT 

/n“ 


where  d  =  yfll  -  yfal 

where  degrees  of  freedom  are  appropriate  to  t 


1  -  a  LIMIT  FOR  MEAN  DIFFERENCE  (Unpaired) 

n  i  l_ 

CT.  -  +  - 

N  n.  n„ 


Mean  Diff.  +  t,.  c  . 

—  (d .  f .  ,  a) 


B 


where  a  = 


'<nA  -  1)0A2  +  <nB  -  1)o-2 


B 


(nA  ~  1}  +  (nB  '  1} 
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the  table  is  the  proportions  at  one  end  of  the  symmetrical 
distribution.  Thus,  if  an  investigator  wants  the  t-value 
for  15  degrees  of  freedom  and  a  .05  probability  of  making 
a  Type  I  error  in  a  two-tailed  test,  he  would  look  up  the 
t  in  the  a  =  .05  column  in  Cochran  and  Cox's  book  and 
a  /2  =  .025  in  the  Box,  Hunter,  and  Hunter  book.  In  both 
cases,  t  =  2.131.  If  he  wanted  a  .05  probability  of  making 
a  Type  I  error  in  a  one-tailed  test,  he  would  look  up  t  in 
the  a  =  .10  column  in  Cochran  and  Cox's  book  and  a/2  =  .05 
in  the  Box,  Hunter,  and  Hunter  book.  In  both  cases,  t  = 

1.753. 

Hader  and  Grandage  (1958),  in  an  excellent  discussion  of 
simple  and  multiple  regression  analysis,  explain  and 
illustrate  how  to  derive  the  confidence  limits  for  regression 
coefficients  and  predicted  values  for  a  multivariate 
situation . 

REPLICATION  TO  ESTABLISH  PERFORMANCE  LIMITS 

Traditional  confidence  limits  are  intended  to  provide  a 
basis  for  pinpointing  the  value  of  a  parameter,  a  mean  or  a 
mean  difference.  There  is  also  a  need  to  replicate  a  single 
condition  at  the  very  end  of  an  experimental  program  to 
answer  the  very  practical  question:  Given  this  device, 
between  what  limits  will  the  group  of  people,  theoretically  from 
the  same  population, likely  to  perform?  We  are  concerned  with 
the  one  and  ninety-ninth  percentile  levels  here  rather  than 
the  fiftieth.  While  the  results  from  the  entire  research 
program  (along  with  the  holistic  approach)  should  make 
prediction  of  these  limits  quite  accurate,  there  is  still 
the  need  to  determine  them  empirically. 
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SECTION  IX 

THE  SIGNIFICANCE  OF  TESTS 
OF  STATISTICAL  SIGNIFICANCE 


In  recent  years,  the  test  of  statistical  significance 
has  become  the  most  widely  used  analysis  performed  by 
behavioral  scientists  doing  controlled  experiments.  The 
results  from  these  tests  have  often  become  the  primary 
criteria  used  by  experimenters,  teachers,  and  editors  alike 
in  evaluating  the  importance  of  an  experimental  study.  In 
fact,  however,  tests  of  statistical  significance  as  ordin¬ 
arily  applied  by  psychologists  provide  very  little  useful 
information  in  general  and  often  lead  to  erroneous  conclu¬ 
sions  in  specific  cases. 

Baken  (1966)  wrote:  "The  test  of  statistical  signi¬ 
ficance  in  psychological  research  may  be  taken  as  an  instance 
of  a  kind  of  essential  mindlessness  in  the  conduct  of 
research"  (p  436) .  Lykken  (1968)  wrote:  "Statistical  sig¬ 
nificance  is  perhaps  the  least  important  attribute  of  a  good 
experiment;  it  is  never  a  sufficient  condition  for  claiming 
that  a  theory  has  been  usefully  corroborated,  that  a  mean¬ 
ingful  empirical  fact  has  been  established,  or  that  an 
experimental  report  ought  to  be  published"  (p  151) .  Coats 
(1970)  wrote:  "Most  graduate  schools  of  education  still 
require  students  to  take  what  may  be  one  of  the  most 
irrelevant  learning  experience  in  their  entire  educational 
career.  The  requirement  is  the  study  of  inferential  statis¬ 
tics"  (p  6) .  Cronbach  (1975)  argued  that  "the  time  has 
arrived  to  exorcise  the  null  hypothesis"  (p  124)  and  Shulman 
(1970)  admonished  that  "the  time  has  arrived  for  educational 
researchers  to  divest  themselves  of  the  yoke  of  statistical 
hypothesis  testing"  (p  389) .  Carver  (1978)  suggests  that 
"the  complete  abandonment  of  statistical  significance  testing 
in  the  training  of  doctorial  students  in  education  research 
should  be  seriously  considered"  (p  396) . 

The  problem  naturally  is  not  with  the  test  of  statis¬ 
tical  significance  itself  but  with  the  way  which  too  many 
psychologists  misuse  and  misinterpret  the  test.  Even  when 
used  correctly,  however,  it  provides  little  information  of 
a  practical  value. 

In  Table  12  are  a  list  of  fallacies  and  facts  regarding 
tests  of  statistical  significance,  particularly  as  they  have 
been  applied  in  the  behavioral  sciences.  This  list  has  been 
culled  from  a  number  of  critical  papers  by  Baken  (1966)  , 
Carver  (1978)  Kleiter  (1969) ,  and  others  referenced  in  these 
three  papers.  The  reader  is  urged  to  savor  the  original 
papers . 
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Fallacy : 


Fact: 


Fallacy : 


Fact: 


Fallacy : 


Fact: 


Fallacy : 


Fact : 


TABLE  12.  FACTS  AND  FALLACIES  REGARDING 

TESTS  OF  STATISTICAL  SIGNIFICANCE 


The  p  value  associated  with  the  F-ratios  in  the 
test  of  statistical  significance  (TSS)  indicates 
the  probability  that  the  observed  effect  is  due 
to  chance . 

The  p  value  is  the  probability  that  an  effect  of 
the  observed  size  could  be  obtained  if,  in  fact, 
it  was  a  certainty  that  chance  is  operating. 

• 

The  p  value  from  the  TSS  is  the  probability 
that  the  same  result  would  be  obtained  if  the 
experiment  were  replicated.  It  is  an  indica¬ 
tion  of  the  reliability  of  the  result. 

Reliability  depends  on  how  well  critical  factors 
are  controlled  between  experiments.  It  cannot 
be  predicted  by  any  statistics. 

• 

The  level  of  statistical  significance  is 
inversely  related  to  the  probability  that  the 
research  hypothesis  is  correct  (e.g.,  a  .05 
significance  level,  if  reached,  makes  the 
chances  .95  that  the  scientific  hypothesis  is 
true)  . 

The  entire  study  could  be  invalid  no  matter  what 
probability  level  is  obtained  in  the  TSS. 
Validity  is  not  a  statistical  concept,  but 
depends  on  the  adequacy  of  sampling  and  the 
quality  of  the  data  collection  and  analysis. 

• 

Using  a  TSS  enables  the  scientist  to  make  more 
objective  decisions  regarding  the  rejection  of 
the  null  hypothesis. 

The  scientific  decisions  are  still  totally  sub¬ 
jective.  The  TSS  helps  make  only  statistical 
decisions  of  little  practical  value.  Since  an 
investigator  is  expected  to  use  his  judgment 
to  plan  and  design  experiments,  why  is  it  so 
sinful  for  him  to  use  it  to  interpret  the  data? 

*  (Continued) 
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continued) 

The  more  subjects  in  the  experiment,  the  more 
faith  one  can  have  that  the  null  hypothesis, 
significant  at  the  p  =  .05  level,  should  be 
rejected. 

Just  the  reverse.  It  takes  a  larger  difference 
to  get  the  same  p-value  with  ten  subjects  than 
with  100. 


Rejecting  a  null  hypothesis  at  a  reasonable 
statistical  significance  level  is  ordinarily 
required  before  one  can  claim  support  for  a 
research  hypothesis. 

One  can  almost  always  get  statistical  signifi¬ 
cance  by  increasing  the  number  of  subjects, 
selecting  another  p-level,  changing  from  a  two- 
tailed  to  a  one-tailed  test  and  so  forth.  A 
failure  to  obtain  a  significance  level  may  have 
nothing  to  do  with  the  hypothesis  and  everything 
to  do  with  sloppy  research. 

A  statistically  significant  factor  is  important. 

With  a  large  enough  N,  the  difference  required 
for  significance  may  be,  for  all  practical 
purposes,  trivial. 


A  failure  to  get  statistical  significance  indi¬ 
cates  that  the  factor  (effect)  is  not  important. 

One  may  not  have  obtained  statistical  signifi¬ 
cance  because  one  used  too  few  subjects  or 
because  one  did  a  sloppy  experiment.  Importance 
has  to  do  with  the  size  of  a  valid  effect. 


When  performance  is  measured  in  artificial 
units,  TSS  helps  to  interpret  the  results. 

TSS  is  not  enough.  It  is  necessary  to  have  some 
outside  anchor  point  to  evaluate  the  practical 
significance  of  the  magnitude  of  the  effect. 


(Continued) 
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(Table  12  continued) 

Fallacy:  If  a  difference  is  statistically  significant,  it 

indicates  that  the  members  of  one  group  performed 
better  than  the  members  of  the  other  group(s). 

Fact:  The  test  only  compares  means,  not  groups.  The 

groups  could  overlap  considerably. 


Fallacy:  If  the  mean  of  Group  A  is  higher  than  the  mean  of 

Group  B,  then  a  statistically  significant  result 
infers  that  Group  A  performed  better  on  average 
than  Group  B. 

Fact:  This  depends  on  the  hypotheses  being  tested. 

Most  psychologists  traditionally  have  used  a 
two-tailed,  non-directional  test.  This  only 
indicates  that  the  observed  difference  could  have 
occurred  with  a  certain  probability  if  chance  had, 
in  fact,  been  operating. 


(Continued  from  Page  68) 

Psychologists  have  been  prone  to  misinterpret  (and 
therefore  misuse)  tests  of  statistical  significance.  Fur¬ 
thermore,  they  have  failed  to  recognize  the  extremely  limited 
information  that  is  derived  from  the  test  when  it  is  perfor¬ 
med  properly.  Too  frequently  the  test  is  not  performed 
properly.  Too  often  the  hypothesis  being  tested  is  not  the 
one  the  experimenter  really  wants,  but  is  the  only  one  with 
which  he  is  familiar.  The  significance  of  several  types  of 
hypotheses  can  be  tested:  loose  or  sharp,  Fisherian  or 
Neyman-Pearson,  one  or  two-tailed,  directional  or  non-direc- 
tional.  Psychologists  have  overwhelmingly  limited  themselves 
to  sharp,  Fisherian,  non-directional,  two-tailed  tests, 
whether  the  information  obtained  was  what  they  wanted  or  not. 
To  make  matters  worse,  among  psychologists  there  has  been 
little  consistency  as  to  what  the  "error"  term  should  consist 
of.  Simon  (1976b)  after  analyzing  the  experiments  published 
over  14  years  in  the  journal  Human  Factors  found  the  error 
variance  was  some  composite  o5  17_ different  combinations  of 
classes  of  factor  designations  (i.e.,  equipment,  subject,  and 
temporal  sources,  main  and  interaction  effects).  The  choice 
affects  the  outcome  of  the  test. 


80 


NAVTRAEAUIPCEN  78-C-0060-3 


In  the  light  of  this  confusion  surrounding  the  use  and 
misuse  of  the  test  of  statistical  significance,  its  role  as 
an  "automatic  decision  maker"  for  those  who  wish  to  escape 
their  responsibility  as  data  interpreters  is  not  justified. 

In  practice,  the  investigator  can  perform  a  number  of  analy¬ 
ses  that  will  supply  the  information  he  needs  to  decide 
whether  an  effect  is  likely  to  exist  or  not  (Simon,  1977a; 
1977b). 

The  test  can  be  used  most  effectively  when  the  investi¬ 
gator  is  interested  in  the  absence  of  a  practical  difference 
among  conditions  (see  Section  X)  rather  than  in  the  exis¬ 
tence  of  a  difference.  When  the  interest  is  primarily  that 
of  detecting  differences,  then  the  weaknesses  of  the  test 
(enhanced  by  characteristics  of  behavioral  research)  come 
into  play.  Cochran  and  Cox  (1957,  p  5)  point  out  another 
"useful  property  of  a  test  of  significance"  which  "is  that 
it  exerts  a  sobering  influence  on  the  type  of  experimenter 
who  jumps  to  conclusions  on  scanty  data,  and  who  might  other¬ 
wise  try  to  make  everyone  excited  about  some  sensational 
treatment  effect  that  can  well  be  ascribed  to  the  ordinary 
variation  in  his  experiment."  This  circumstance  is  more 
likely  to  occur  when  only  one  or  two  factors  are  being 
investigated  in  vacuo  than  when  they  are  examined  in  a 
larger  (holistic)  context.  In  the  same  vein,  with  a  holistic 
philosophy,  the  "ordinary  variations"  in  behavioral  experi¬ 
ments  are  likely  to  be  reduced  considerably  when  investigators 
actively  seek  to  account  for  the  critical  sources  of  variance 
rather  than  trying  to  hide  them  within  a  massive  uneconomical 
replication  effort. 

Cochran  and  Cox  continue  by  saying:  "On  the  whole,  how¬ 
ever,  tests  of  significance  are  less  frequently  useful  in 
experimental  work  than  confidence  limits."  In  this  regard. 
Box,  Hunter,  and  Hunter  (1978,  p  109)  state:  "Significance 
testing  in  general  has  been  a  greatly  overworked  procedure, 
and  in  many  cases  where  significance  statements  have  been 
made  it  would  have  been  better  to  provide  an  interval  within 
which  the  value  of  the  parameter  would  be  expected  to  lie." 

The  test  of  statistical  significance  should  not  be 
totally  ignored.  However,  it  must  be  put  into  proper  per¬ 
spective,  used  cautiously  and  properly,  and  given  a  low 
priority  among  a  number  of  other  techniques  that  are 
available  to  aid  the  investigator  in  his  decisions  regarding 
the  presence  and  importance  of  an  effect. 
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SECTION  X 

DETERMINING  THE  PROBABILITY  OF  ACCEPTING  THE 
NULL  HYPOTHESIS  WHEN  IN  FACT  IT  IS  FALSE 
(Applications  to  the  interpretation  of  screening  studies) 


Since  most  psychological  experiments  are  planned  to 
discover  an  effect,  a  difference,  or  a  correlation,  an 
investigator,  applying  inferential  statistics,  is  more 
interested  in  rejecting  the  null  hypothesis  than  in 
accepting  it.  In  many  cases,  in  deciding  whether  the  effect 
observed  in  the  sample  data  is  real  or  not,  he  focuses 
almost  totally  on  the  risk  of  making  a  Type  I  error  and 
ignores  the  risk  of  making  a  Type  II  error.  Tfcat  is,  he 
tries  to  minimize  the  chances  of  saying  there  is  a 
difference  when,  in  fact,  there  isn't.  This  orientation  has 
caused  many  investigators  to  ignore  the  risk  of  accepting 
the  null  hypothesis  (i.e.,  saying  there  is  no  effect,  no 
difference,  no  correlation)  when,  in  fact,  it  is  false 
(Type  II  error) . 

There  are  circumstances  when,  as  a  practical  matter, 
an  investigator  should  decide  to  accept  or  reject  a  hypo¬ 
thesis  by  weighing  both  risks,  and  more  important,  there 
are  circumstances  when  he  should  suspend  judgment,  i.e., 
make  no  decision  until  additional  evidence  has  been 
collected. * 

There  are  situations,  however,  in  which  the  option  of 
suspending  judgment  cannot  be  exercised.  The  investigator 
must  make  a  decision  on  the  basis  of  inadequate  evidence. 
This  can  be  found  in  an  application  of  the  lack  of  fit  test 
when  a  central-composite  design  is  being  employed. 

By  way  of  example,  let  us  look  at  the  results  from  a 
study  (North  and  Williges,  1971,  Table  5)  in  which  the 
investigators  made  a  decision  to  stop  collecting  more  data 
on  the  basis  of  their  test  for  lack  of  fit.  Their  analysis 
of  variance  table  with  the  lack  of  fit  test  is  shown  in 
Table  13.  The  investigators  refused  to  reject  the  null 
hypothesis  because  the  probability  value  of  the  F-ratio 
for  the  lack  of  fit  test  was  only  p  =  0.15  and  they  had 
established  the  0.05  significance  level  as  the  critical 
cut-off  point.  Still  with  only  three  degrees  of  freedom  in 
the  term  used  for  "error"  (Replications) ,  the  test  was  not 
a  very  powerful  one.  Clearly  this  was  a  situation  in  which 
the  evidence  made  it  difficult  to  make  a  firm  decision 


*  Hays  (1963,  Chapter  9)  has  an  excellent  discussion  on 
hypothesis  testing  and  interval  estimation  from  the  point 
of  view  of  the  behavioral  scientist. 
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TABLE  13.  REPRODUCTION  OF  NORTH  AND  WILLIGES '  ANALYSIS 
OF  VARIANCE  FOR  NUMBER  OF  CORRECT  LOCATIONS 


source 

% 

df 

Variance 

F 

P 

REGRESSION 

(  .488) 

(  4) 

.90 

5.97 

.05 

Focus 

.  106 

1 

.78 

5.18 

.05 

Dens i ty 

.  331 

1 

2.44 

16.19 

.01 

Visual  Angle 

.011 

1 

.  08 

0.52 

TV  Lines 

.040 

.30 

2.00 

RESIDUAL 

(  .511) 

(25) 

.15 

Blocks 

.  001 

2 

.00 

0.068 

Lack  of  Fit 

.493 

20 

.18 

4.301 

.15 

Replications 

.017 

3 

.04 

TOTAL 

(1.000) 

(29) 

regarding  whether  or  not  there  was  a  fit.  On  the  one  hand, 
a  p  =  0.15  is  not  small  enough  for  most  psychologists  to 
reject  the  null  hypothesis  (although  I  am  not  sure  why  this 
must  be  as  strong  a  rule  as  it  tends  to  be) .  On  the  other 
hand,  there  is,  in  fact,  a  sizeable  lack  of  fit  in  this 
example,  for  it  accounts  for  nearly  50  percent  of  the  total 
performance  variance  while  the  two  "significant" 
experimental  factors  account  for  only  33  percent  and  11  per¬ 
cent  of  the  total  variance. 

This  example  illustrates  the  dilemma  that  can  occur 
when  making  a  Type  I  error  is  the  only  risk  considered.  The 
purpose  of  the  study  was  to  develop  an  equation  that  best 
approximated  the  performance  data.  The  lack  of  fit  test  was 
done  to  see  if  this  had  been  accomplished  or  if  more  data 
needed  to  be  collected.  Not  rejecting  the  null  hypotheses 
--  whether  or  not  it  was  verbalized  that  a  final  judgment 
was  being  suspended  --  resulted  in  a  de  facto  acceptance  of 
the  null  hypothesis  since  no  further  data  was  collected  to 
resolve  any  ambiguity.  "That  this  was  not  the  best  solution 
becomes  clearer  when  the  risk  of  making  a  Type  I  error 
(p  =  0.15  in  this  case)  is  compared  with  the  risk  of  making 
a  Type  II  error.  Calculations  show  that  the  risk  of  saying 
that  the  fit  is  adequate  when,  in  fact,  it  isn’t,  is 
p  =  0.69.  Thus  by  failing  to  reject  the  null  hypothesis 
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because  they  did  not  want  to  risk  (p  =  .15)  making  a  Type  I 
error,  they  had  accepted  a  larger  risk  (p  =  .69)  of  making 
an  error  of  the  second  kind,  implicitly  accepting  tha  null 
hypothesis  by  not  collecting  more  data.  Had  they  been  aware 
of  the  size  of  the  second  probability,  it  is  presumed  the 
investigators  would  have  continued  to  collect  more  data  to 
improve  the  fit.  As  it  stands,  the  derived  equation 
probably  yields  biased  predictions. 

WEIGHING  THE  RISKS  IN  SCREENING  DESIGNS 

Calculating  the  risk  of  making  a  Type  II  error  can  be 
particularly  useful  when  screening  design  are  involved.  Once 
the  results  have  been  obtained,  ordered,  and  plotted  on 
normal  (or  half-normal)  probability  paper  (see  Simon,  1977a) , 
the  investigator  must  decide  where  to  draw  the  line  between 
the  effects  that  are  probably  real  and  those  that  are 
probably  due  to  chance.  From  the  plots  and  from  the  data 
itself,  the  investigator  will  ord'narily  find  it  easy  to 
make  decisions  regarding  the  very  large  and  the  very  small 
effects.  But  there  are  marginal  effects  between  the  two 
extremes  about  which  decisions  are  difficult  to  make. 

Daniel  (1976,  p  416)  has  this  to  say  about  these  marginal 
effects : 


"The  dropping  of  factors  from  further 
consideration  after  an  early  ’screening’ 
experiment  is  sometimes  justified.  But 
it  is  also  a  very  common  source  of  serious 
errors.  The  28-4  may  be  0f  higher  power 
than  much  of  the  experimenter's  previous 
work,  but  some  care  must  still  be  taken 
to  avoid  'Type  II'  errors.  Such  errors 
can  be  particularly  treacherous  with 
multi-factor  experiments.  For  example, 
one  factor  may  be  clearly  dominant  over 
the  others  taken  singly.  But  it  may  be 
that  three  or  four  of  the  dropped  factors, 
if  varied  together,  would  have  had  as  big 
an  effect  as  the  one  called  dominant." 

When  the  investigator  is  faced  with  the  difficult  deci¬ 
sion  of  what  to  do  with  the  margii  al  sources  of  variance, 
several  options  are  open  to  him.  He  may  decide  that  since 
the  effect  of  each  marginal  variable  is  small,  if  he  omits 
it,  any  equation  based  on  the  mo^e  obvious  variables  will 
account  for  the  major  chunk  of  the  performance  variance  and 
that  any  bias  from  the  omitted  variables  will  be  of  little 
practical  importance.  With  limited  resources,  his  main 
concern  initially  is  to  account  for  the  larger  portion  of 
the  total  performance  variance  and  to  worry  about  marginal 
sources  at  the  time  may  not  be  cost-effective.  If  one 
thinks  of  experiments  within  the  paradigm  of  a  multi-factor 
research  program  (rather  than  as  an  experiment) ,  then  the 
investigator  expects  to  have  the  opportunity  to  refine  his 
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B)  Select  the  significance  level,  a,  that  the 

investigator  will  accept  as  the  risk  of  making 
a  Tvpe  I  error  when  rejecting  the  null 
hypothesis . 


Example:  A)  df^  =  20,  df2  =  3;  B)  a  =  .05. 

Step  II. 

A)  Obtain  a  table  of  critical  values  of  the  F- 
distribution . 

B)  For  the  appropriate  degrees  of  freedont,  find 
the  F-value  in  the  table  associated  with 
the  probability,  8  =  (1-a) . 


Example:  A)  One  of  the  most  complete  tables  of  critical 

values  of  the  F-distribution  was  published 
by  D.  B.  Owen,  Handbook  of  Statistical 
Tables ,  Wesley,  1952,  Section  4.1,  pp  63-87. 
A  portion  of  the  table  from  his  page  66 
is  shown  here. 
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Example:  For  6  =  .250,  20  and  3  df;  look  up  .750,  3  &  20  df, 

=  1.4808  and  take  the  reciprocal  =  .6753. 

B)  Complete  the  four-column  table. 

Example:  8  F  _ X 1  X 

.250  .6753  12,82  3,58 

.100  .4201  20.61  4.54 

Step  VI. 

A)  Plot  the  values  on  a  graph  with  the  probabilities 
(6)  on  the  ordinate  and 

X  (=  =  the  square  root  of  F) 

on  the  abscissa. 

B)  Connect  the  points  in  a  smooth  curve.  (A 
certain  degree  of  imprecision  here  can’t  be 
avoided  unless  a  great  many  more  values  are 
plotted. ) 


Example:  See  Figure  6  for  a  plot  of  the  operating 

characteristic  curve  to  be  used  with  the 
data  : ”  le  13.  For  the  particular 
F-valo  .if  risk  of  making  a  Type  II 
error  r,  i  be  estimated.  (Note  that  the 
square  jot  of  the  F-value  is  used  to 
enter  the  table.) 

Thus,  to  estimate  the  risk  of  a  Type  II 
error  when  the  F  equals  4.301  (Table  13, 
Lack  of  Fit) ,  we  look  up  the  square  root 
of  F  (2.07)  and  find  the  associated 
probability  (when  a  =  .05  and  there  are 
20  and  3  degrees  of  freedom)  to  be  .69. 
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SECTION  XI 

TESTING  NON-ADDITIVITY  IN  EXPERIMENTAL 
DATA  FROM  A  LATIN  SQUARE  DESIGN 


Psychologists  have  made  considerably  more  use  of  Latin 
square  designs  than  is  warranted,  often  disregarding  or 
being  unaware  of  the  conditions  that  must  be  met  before  the 
data  can  be  considered  unbiased  and  tests  of  statistical 
significance  valid.  The  popularity  of  this  design  rests 
with  its  application  to  what  is  sometimes  referred  to  as  a 
"within  subject"  design.  A  single  subject  is  tested  on  all 
experimental  conditions  presented  to  them  sequentially  and 
by  having  the  same  number  of  subjects  as  experimental 
conditions,  the  order  of  presentation  can  be  counter¬ 
balanced.*  With  eight  subjects  and  eight  conditions  pre¬ 
sented  in  counterbalanced  orders,  the  Latin  square  is  an 
8x8  matrix.  This  design  is  actually  a  fractional  factorial 
in  which  the  effects  of  three  factors  —  Conditions,  Subjects, 
and  Trials  —  are  being  examined  in  a  two-dimensional  space. 

In  an  8  x.  8  design,  there  are  63  degrees  of  freedom  parti¬ 
tioned  into  7  for  Conditions,  7  for  Subjects,  7  for  Trials, 
with  42  left  —  actually  a  confc  nding  of  interactions  — 
to  be  treated  as  error.  This  design  is  efficient  therefore 
only  when  there  is,  infect,  no  interaction  of  any  kind  among 
the  three  sources.  In  human  performance  studies,  this  is 
often  an  untenable  assumption  {without  considerable  prepara¬ 
tion  ‘ahead  of  time)  since  it  will  be  invalid  if  the  subjects 
learn  at  different  rates  (subject  x  trial  interaction) .  Then 
too  interactions  are  often  found  between  subjects  and 
conditions  due  to  a  non-linear  reaction  of  subjects  of 
different  abilities  to  tasks  of  different  difficulties. 

Because  of  these  inherent  dangers,  an  investigator  should 
obtain  a  rough  estimate  of  whether  or  not  non-additivity  is 
present  to  avoid  misinterpreting  his  results. 

Tukey  (1949;  1955)  provides  a  test  for  non-additivity 
for  matrices  of  data  in  which  there  is  a  single  observation 
per  cell.  If  the  test  for  non-additivity  is  statistically 
significant,  it  suggests  that  the  linear-by-linear  inter¬ 
action  component  is  confounded  with  the  main  effects  and 
"error"  variance.  The  test  is  particularly  sensitive  to  the 
non-additivity  that  occurs  when  there  is  a  correlation  be¬ 
tween  a  subject's  average  performance  and  the  rate  at  which 
his  performance  changes  relative  to  the  change  in  group 
performance . 


*If  there  is  an  odd  number  of  experimental  conditions, 
twice. as  many  subjects  are  required  for  counterbalancing. 
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Fo'  lowing  ••’i.kv}  <!•  r,  the  ops  required  to  test 

for  this  non- audit  ivi‘,  p<-*  enc:  employing  a  Latin 

square  design  are  giv. 

TPKEY'S  VEST  oF  Ag,-.  ’jUARES 

In  Table  14.  the  vie;.,  .  . ;  p.iare  with  one  value 

per  cell  is  presented  in  an  :  ;  r.  which  rows  are 

Subjects,  Columns  are  T-  ;  ’  -  .d ; ♦ ions  are  distributed 

in  a  counterbalanced  i:.anr  ■  -'is.*  The  circled 

numbers  in  the  table  relate  the-  irt ion  of  the 
table  to  the  steps  in  the  m.-dys  -scribed  below.  To 
perform  the  test  for  non- -cm;  ». :  square,,  the 

following  steps  will  he  ■  -  •  - 

1.  Obtain  the  grand  mean  *  :s  original  data. 

2.  Obtain  the  mean  o*  each  r  w  (Subject)  and  find 
how  much  each  dev  fates  ti  m  the  grand  mean. 

3.  Do  the  same  for  each  column  (Trial) . 

4.  Do  the  same  for  each  tiea'v.eit  (Letters). 

5.  Find  the  predicted  value  ,r  each  cell  by 
fitting  the  additive  model*  Grand  Mean  + 

[the  sum  of  the  deviate  .  for  the  row, 
column,  and  treatment,  for  he  cell)  each 
with  the  correct  signs. 

6.  Subtract  the  predicted  observed 

value  in  each  cell  .o  the  residual  in 

each  cell.  (Arithmetic  *k :  These  must 

sum  to  zero  in  each  row  2  timn  and  treatment)  . 

7.  Subtract  the  grand  .tear,  .n  each  predicted 

value,  square  c  ico  diffc  ce  and  build  a 

new  array  « VY  >  with  then  alues.  (Note: 

If  sc  ;  i.  v  i  v  ie-  s  art  :  urge  and  unwieldy, 

they  may  be  divided  by  a  power  of  10  that 
reduced  them  in  size  to  a  whole  number  with 
the  original  number  of  significant  figures). 

8.  Do  an  analysis  of  variance  or.  the  array  of  values 
chtamed  in  Step  7  and  obtain  the  interaction 
sum  of  }>  unties  f  r  array  VY.  (Interaction  sum 

of  squares  equals  t  he  tot  u  sum  of  squares  minus 
the  column  sum  of  squares  mine,  the  row  sum  of 
squares  mi  ms  t  h.  t  ■  eatnur  '  s ;  .*  of  squares). 


*The  data  was  taken  from  a  La*  ’ n  square  in  Davies  (1967, 

p  194).  The  ox. ingle  was  pit.  ared  l  ■  : zabeth  Lage  Poscoe. 
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9.  Multiply  the  value  in  each  cell  from  Step  7 
by  its  corresponding  residual  value  from 
Step  6.  Sum  all  of  these  values,  and  square 
that  answer  (9A) . 

10-  Divide  the  squared  value  from  Step  9A  by  the 
value  in  Step  8  and  obtain  the  sum  of  squares 
for  non-additivity  with  one  degree  of  freedom. 

The  non-additivity  variance  is  the  same  as  its 
sum  of  squares. 

11.  Obtain  the  interaction  sum  of  squares  for 
array  XX.  (See  Step  8  for  the  equation  with 
which  to  extimate  the  interaction  sum, of 
squares.)  The  degrees  of  freedom  for  this 
interaction  equals  the  total  degrees  of 
freedom,  (N  -  1) ,  minus  three  times  the  number 
of  degrees  of  freedom  for  treatments,  (T  -  1) . 

12.  Subtract  the  sum  of  squares  for  the  interaction 
of  the  XX  array  (Step  11)  to  obtain  the 
remainder  sum  of  squares.  The  remainder  degrees 
of  freedom  equals  the  degrees  of  freedom  for 
the  interaction  XX  array  sum  of  squares  minus 
one . 

13.  Obtain  the  remainder  variance  by  dividing  its 
sum  of  squares  (Step  12)  by  its  degrees  of 
freedom  (Step  12) . 

14.  The  F-test  for  non-additivity  is  made  by 
dividing  the  non-additivity  variance  (Step  10) 
by  the  remainder  variance  (Step  13)  .  This 
will  be  evaluated  in  the  conventional  manner 
using  a  standard  F-distribution  table  for  1 
and  (N  -  3T  +  1)  degrees  of  freedom. 

If  the  F-value  is  statistically  significant  or  the  non¬ 
additivity  sum  of  squares  accounts  for  a  sizable  proportion 
of  the  total  sum  of  squares,  then  we  must  be  prepared  to 
reject  the  hypothesis  of  additivity  and  recognize  that  this 
assumption  (fundamental  when  using  a  Latin  square)  is  not 
being  met. 

What  to  Do? 

Some  may  be  tempted,  in  the  face  of  non-additivity,  to 
ignore  its  one  degree  of  freedom  and  to  use  the  mean  square 
for  the  balance  as  the  "error"  term  for  significance  tests. 
Tukey  (1949,  p  237)  does  not  recommend  this.  First  of  all, 
he  points  out,  it  is  more  practical  to  express  results  in 
additive  terms  since  they  tend  to  apply  over  a  breader 
region.  Second,  if  the  "error"  variance  is  non-normally 
distributed,  the  balance  variance  "unduly  inflates  the 
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apparent  significance  ot  the  other  mean  squares".  In  the 
presence  of  a  large  non-addi tivitv  term,  Tukey  suggests  an 
examination  of  the  da*  r>  to  rmino  whether  the  non¬ 

additivity  is  due  to  one  or  moie  unusually  discrepant  values 
or  whether  it  is  due  to  analysis  in  the  wrong  font. 

Tukey  (1945)  proposes  a  graphic  method  to  help  this 
decision.*  For  small  amounts  of  data  the  technique  can  be 
ambiguous.  The  reader  is  refeiied  to  the  paper  for  further 
enlightenment.  If  it  is  decided  that  the  non-additivity  was 
due  to  analyzing  the  data  i n  the  wrong  form,  then  transfor¬ 
mation  of  the  data  is  in  order. 

GENERAL  FORM  OF  TUKEY  *  S  NON  ADL 1 TI V I TY  TEST 

Tukey  (1955)  provides  a  general . procedure  for  performing 
the  non-additivity  test  for  any  design.  The  steps  are  simple 
and  are  listed  below  to  enable  the  reader  to  gain  insight 
into  what  he  is  actually  doing.  To  test  for  non-additivity 
in  any  f actori al- type  des ; gn  with  a  single  measure  per  cell: 

1.  Obtain  the  residual  for  each  cell  (one  per 

cell) .  [x] 

2.  Square  the  predicted  values  for  each  cell  and 
treat  those  values  as  original  data,  and  obtain 
the  residuals  for  that  set  of  data  for  each 
cell.  (Nothing  will  be  changed  if  a  constant 
is  added  to  the  predicted  values  prior  to 
squaring.)  [y] 

3.  Obtain  the  sum  of  cross-products  of  values  ob¬ 
tained  in  steps  1  and  2  f c  und  in  corresponding 
cells.  [Ixy] 

4.  Obtain  the  sum  of  squares  of  values  obtained  in 

step  1.  [lx2] 

5.  Divide  the  ■?.  obtained  in  step  3  by  the  sum  ob¬ 
tained  in  step  4 .  This  is  the  estimated  coef¬ 
ficient  (i.e.,  least  squares  estimate)  for 

non  additivity.  [Ixy/Ex2] 

6.  Square  the  sum  obtained  in  step  3  and  divide  by 
the  sum  obtained  in  step  4  to  obtain  the  sum  of 
squares  for  non-additivity  with  one  degree  of 
freedom  (which  is  partitioned  from  the  residual 
sum  of  squares  in  the  ANOVA  table  and  tested  as 
any  other  source  of  variance  in  the  ANOVA.) 

[ (Zxy) 2/Ex  2  ] 


*The  method  was  developed  for  a  row-by-column  table  and 
will  need  to  be  modified  to  be  applied  to  the  Latin  square. 
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SECTION  XII 

HOW  TO  INCLUDE  FACTORS  WITH  MORE  THAN  TWO  LEVELS 
IN  A  SCREENING  DESIGN 


Screening  studies  are  usually  2^-P  fractional  factorial 
designs.  Two  levels  are  used'  since  the  goal  is  to  identify 
the  critical  candidate  factors  and  not  to  obtain  a  functional 
relationship.  If  these  two  goals  are  separated,  the  economy 
of  large-scale  multifactor  experiments  can  ordinarily  be 
increased  (Simon,  1973,  1977a).  Onct  the  critical  factors 
have  been  found,  additional  levels  may  be  added  to  this 
smaller  number,  if  necessary,  to  study  a  more  complex  function 
and  response  surface. 

There  are  circumstances,  however,  when  an  investigator 
may  wish  to  study  some  factors  at  more  than  two  levels 
during  the  screening  phase.  For  example,  if  the  factor  is 
a  qualitative  (categorical)  one,  there  is  no  clear-cut 
method  of  selecting  only  two  out  a  larger  number  of  dif¬ 
ferent  categories.  While  the  ordinary  procedure  would  be 
to  select  the  categories  believed  likely  to  represent 
extremes  in  performance  —  this  judgment  being  based  on 
life  experiences  or  preliminary  studies  --  this  approach  may 
sometimes  be  considered  too  tenuous.  In  that  case,  the 
investigator  may  wish  to  include  several  categories  in  his 
original  screening  study  to  obtain  information  regarding 
each  one  and  some  clues  regarding  how  they  might  interact 
with  other  factors.  If  the  factor  is  a  quantitative  one 
and  is  believed  to  relate  to  performance  with  a  U-shaped 
function,  selecting  two  points  at  which  performance  dif¬ 
ferences  are  likely  to  be  close  to  maximum  may  be  less 
accurate  than  is  desired.  If  there  is  sufficient  uncertain¬ 
ty,  the  investigator  may  wish  to  insure  himself  by  studying 
three  or  four  levels  along  the  dimension. 

Whatever  the  reason  for  wanting  to  include  three  or  four 
levels,  the  investigator  must  weigh  the  advantages  against 
the  loss  in  data-collection  economy  and  the  increased 
difficulty  in  interpreting  the  results.  Certainly  the 
practice  of  having  more  than  two  levels  should  be  used 
sparingly  during  the  screening  phase. 

METHODS  OF  INCLUDING  MORE  THAN  TWO  LEVELS 

There  are  several  ways  in  which  factors  with  more-than- 
two-levels  might  be  introduced  into  a  screening  design.  One 
way  would  be  to  include  the  three-or-more  level  factor  out¬ 
side  the  basic  screening  plan.  This  means  that  the 
screening  design  would  be  repeated  at  every  level  of  the  new 
factor.  This  reduces  the  economy  of  the  experiment,  but 
if  the  added  data-collection  effort  can  be  tolerated,  it 
might  be  used  if  the  factor  is  an  important  one  and 
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disordinal  1  n  l  «.-r  •.  *  :  .  ■  ■.  :i  ,  i  .khj  ji  her  iact.ors  arc- 

suspected.  roc*  >:-.s  way  ei  nt  t  ..due  1  ng  i  da.-  multilevel 

factor  would  t,.-  to  us--  2  1  !.r‘  i o.i.i  1  designs  that  a  1  ready 

exist.  (Conu  i  iiu.l  .  :  ,  i  ;*  ti  I  )  .  Th  i  -  ip;,  roach,  however,  is 

the  least  « t  rr,i.  ;  v,  mi,.  •  i  se  .iex.  i-aus  a  i  e  seldom  econom¬ 
ical  and  nut  .a !  .  i  -  ■>  fr.j -i-n :  ng  ./urposes .  A  thiid 

method  can  lie  loy--d  wt  tuo  .o,  1  c  screening  design  is 
not  saturated,  U.  .•*  ; . .  ’•  opacity  to  isolate 

more  independent  u  .  •  ...  ..  ;  r.  :  eguired  to 

measure .  m  this  mm  ,  n.,  .  /  t ,  ei  ..-ugh  space  to  intro¬ 
duce  a  throe-  or  four-  t.  or  the  screening  design 

This  has  the  advantage-  .  /  . . .  •  nt  lining  the  economy  of  the 
effort  during  'hi-  •  . .  •  •  , *  •  rogram.  It  has  the 

disadvantage  •  •  .  .  >  of  data  i  p.terpre- 

tat  twTi . 

INCLUDING  A  ht i i  ' .  SC  KLEIN ENG  DESIGN 

Addolmaa  (ldM,  ..  .  \  ran  and  Cox,  1957, 

pp  27  3-? 74)  sh  •  h  .•  factors  can  be 

replaced  by  cm  '  confounding  main 

effects.  Tn is  is  he  it 

'  ■  i  :  .a 

Condi  t  Lons  A  ’-> 

1 

2 

3 

4 

Note  that  ail  three  coi  ninns  h .  i  eer .»  „re  orthogonal  to  one 
another,  as  are  the  coJ  im.ns  ’  u  a  screening  design.  However, 
whi  le  the  experimenter  may  it  el  v  so  ieet:  any  two  columns  of 
a  screening  des  ign  h-  •  .  s,  i  t-  third  must  be  the 

column  represen*  I  r,  •  •  ..  *  -  .r  the  first  two,  even 

though  in  the  scr-  ••  .  :  ;ay  be  aliased  with  a 

main  effect  os  ...  only  three  such 

columns,  only  tom  ,  .  -  .  i  »  signs  will  be  found 

in  the  rows,  i.e.,  — f,  +  an:  t+a .  The  procedure  is 

to  designate  each  .  >  -  ,im>  ■  • :  i  gin  i,  is,  •» ,  e,  \  r  and  i,  of 
a  new  four -level  *  .:.■  ■>.  .  .■  ■  .  i  ..  the  method  would 

used  to  create  a  4  .  i  *  r>  ....  1  -  n  i  ..  I  design. 

Step  1:  Select,  at.  ,  .a!  u  ;  ,n  of  adequate  size. 

Since  we  must  use  tie-  ■]  i.-  n  .  or  <«n  orthogonal  design 

for  each  four-le'  el  i  -u  <•>:  ,  .  •  . . .  design  capable  of 

estimating  seven  mn  in  'to.;..,  i  ,  .  i  t  t  -  cl  for  the  4x2^ 
fractional  factorin’  in  t  h  i ..simple.  A  2,'^  design  will  be 
used,  made  up-  of  t  h«  f--  .  i-  m.u  e  i  gh :  experimental  conditions 


New  Factor 
Levels 
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Conditions  New  Factor  Labels 

A  B  C  D  E  F  G 

1  ---  +  +  + 

2  +  ----  + 

3  -  +  - 

4  +  +  -  +  -- 

5  +  +  -  r 

6  +  —  +  —  +  — 

7  -  +  +  --  + 

8  +  +  +  +  +  + 

A  B  C  AB  AC  BC  ABC 


Step  2:  Select  any  two  columns  (i.e.,  factors)  plus  a 

third  that  includes  the  generalized  interaction 
as  its  alias. 

We  will  use  factor  (columns)  A  and  B.  This  forces  us 
to  use  factor  D  since  D  is  aliased  with  the  AB  interaction 
in  the  Resolution  III  design.  In  a  Resolution  IV  design, 
the  third  effect  would  be  the  isolated  string  containing 
the  appropriate  interaction. 

Step  3:  Replace  each  unique  sign  pattern  (in  a  row  of  signs 
of  the  selected  factors)  with  a  symbol  representing 
each  different  level  of  a  four-level  factor. 

A _ B 

+ 

+ 

+  + 

+ 

—  4 

+  + 


+ 

+ 

+ 

+ 


9') 
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Tno  levels  nr  i  aoun  X  m  e  ue  si  qua  ted  categories  a,  6, 
6  fur  a  quality  ;  m  ra  i,a  r  (tor  example  )-3,  -1,  +1, 

♦•3 ,  respective  1  y ,  it  .  ...  .  \  is  quantitative  factor. 

4 

Step  4:  Write  trie  complete  4  x  r  mam  effects  design. 

x  v  G 

i  >  -  ; 

2) t  + 

3)  ‘  -  + 

4 )  -  - 

'  >  ;  ~  ■* 

6 }  : 

7 )  ■  -t- 

8)  ;  +  •  -  + 

All  five  m,i  i  n  ett.e  is  a;  •  orthoq.  nal  to  one  another.* 
It  is  a  Resoluti  I  ‘  :  ....-sign.  With  this  "main-effect" 
(i.e..  Resolution  III)  ue&i on ,  the  assumption  (hat  all 
higher-order  etfect  *  n*-.  !  '  »,  ibic  must  be  valid  or  else 

the  results  wi  1  i  t-  hi  am-ci .  :  f  that  assumption  cannot  be 

made,  additions,  ciut  a  must  be  ...  al  looted  according  to  the 
strategy  to  be  'is. id  m  a!  screening  studies. 


‘Pniu'ij.  if'  •  ;  •  r-  f  .  o  i  i i’r.  -  jijt'nc  i  es  ( i '  1  a  c  k  e  1 1 ,  19  4  6) 
This  principle  si  a*  os  I  hut  "...a  necessary  and  sufficient 
coni!  l  t  i  .on  thuf  *  h>.  '..am  <  !  :  •  •’  s  e;  t  imat  os  fit  two  factors  be 
uncorrei.it''  i  ;s  t  h  i  ‘  h.  1  ••  is  i>!  one  fad  or  occur  with 
each  of  I  he  i  •  ‘  ‘in.  .  i.ci  facto;  with,  proportional 

frc<  t  uerioi  os  .  "  I'm  t  h.  a  i  ;  .  if  also  Ft  a  ter;  that  .  for 

main  off  eei  t  .  ,  <  two  •  act  o  i  interaction 

effect.;;  each  •  a  ■;  .  ■  :  :  ;  '.•>  ' ,  •.■e  L  s  o r  two  factors  must 

occur  wi'li  tic  i  .  -■  -a  u  I  n  (  i  a  i:.a  •  i.  et  t  ect  with  propor¬ 
tional  f  r<  j  in.  .,  ...  i  t  oeo.  !  i  ma  1  re  1  a  t.  i  on  sh  i  ps 

hold  for  i  r  et  .  r...  1  ;  ,ic  :  . ,,  1  1  ,  J^'P  q,, signs,  character  is- 
t.  oa  1  1  y  a.  !  :  ■  ,:i  -n 
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COLLAPSING  FOUR-LEVEL  FACTORS  TO  THREE-LEVEL  FACTORS 

Three-level  factors  are  not  created  directly  from  two- 
level  factors.  Instead,  a  four- level  factor  must  first  be 
created  through  replacement  and  collapsed  to  become  a 
three- level  factor.  Addelman  (1963,  p  61)  illustrates  this 
thus : 


Two- level 
factor 


Four-level 

factor 


Three-Level 

factor 


Replacement 


Collapsing 


+ 


a 


a 


+ 


>  B 


»  B 


—  +  —  i .  - - - - -  — ^  X 

+  +  +  - ^  6 


»  * 

■»  B 


Category  6  is  changed  to  Category  3.  For  quantitative 
factors,  the  values  -1,  0  and  +1  could  be  substituted  for 
the  categories  a,  3,  and  X.  When  the  three-level  factor  is 
introduced  this  way  into  a  2P-q  screening  design,  the 
Principle  of  Proportional  Frequency  guarantees  that 
orthogonality  will  be  maintained. 

4 

Let  us  use  the  4x2  main  effect  plan  developed  in 
Steps  1  through  4  of  the  previous  section  to  create  a 
3  x  24  main  effect  plan.  Collapsing  the  four-level  factor 
to  three  levels  in  the  manner  shown  above  would  create  the 
following  design: 


Four-level  Three- level 

factor  factor 

Collapsing 


Remainder  of  Two-Level 
factors 


Y  C  E  F  G 

Factors 
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AUGMENTING  THE  3  x  2k_P  DESIGN 

When  multilevel  factors  are  developed  from  2k_p  designs 
in  the  manner  described,  the  plans  remain  "main  effect" 
designs.  This  means  that  the  assumption  —  it  must  be  a 
valid  one  —  is  made  that  no  interaction  effects  exist.  This 
assumption  is  generally  not  tenable  in  the  behavioral 
sciences  and  for  this  reason  we  usually  insist  that  main 
effects  be  isolated  at  least  from  two-factor  interaction 
effects . 

Daniel  ^1976,  p  229)  describes  how  this  would  be  done 
with  a  3  x  2K  p  design.  He  proposes  that  the  usual  reversal 
of  signs  be  made  with  the  two-level  factors  —  as  with  the 
"foldover"  design  —  and  that  for  the  three-level  factor, 
the  levels  would  be  interchanged  in  the  following  manner: 


Daniel  discusses  the  analysis  of  these  designs  if  the 
multilevel  factor  is  a  quantitative  variable.  Two  dummy 
variables  for  linear  and  quadratic  effects  would  be  sub¬ 
stituted  for  the  three-level  factor.  With  these  new 
"variables",  the  aliases  can  be  determined  (p  226-229). 

Because  of  ambiguity  in  interpretation,  until  more 
experience  is  obtained,  the  reader  should  use  these  designs 
cautiously  for  whatever  information  or  clues  that  might  be 
obtained  without  full  dependency  upon  the  results. 
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SECTION  XIII 

ANALYZING  EXTRA-PERIOD  CHANGE-OVER  DESIGNS 


To  conserve  subjects  and  to  make  more  precise  comparisons 
among  treatments,  a  common  procedure  in  human  engineering 
experiments  is  to  test  the  same  subject  sequentially  on  a  series 
of  different  experimental  conditions.  In  practice,  however,  the 
intended  advantages  of  this  type  of  design  may  be  outweighed  by 
biases  introduced  by  effects  artificially  created  by  the  se¬ 
quential  presentation.  Many  psychologists  attempt  to  overcome 
this  problem  by  using  counterbalanced  Latin  square  designs  in 
which  each  treatment  appears  once  in  every  column  (period)  and 
once  in  every  row  (subjects) .  When  each  treatment  is  arranged 
so  that  it  precedes  and  follows  every  other  treatment  just 
once,  the  design  is  referred  to  as  a  carry-over  or  change-over 
design.  In  addition  to  being  able  to  isolate  the  effects  of 
treatments,  subjects,  and  periods,  the  change-over  design 
enables  a  residual  effect  carried  over  from  the  treatment 
given  on  the  previous  trial  to  be  isolated  from  the  direct 
treatment  .effect.  Simon  (1974)  describes  the  construction 
of  a  number  of  different  change-over  designs  in  his  summary 
of  techniques  for  handling  various  sequence  effects. 

In  the  basic  Latin  square  change-over  design,  estimates 
of  direct  and  residual  treatments  effects  are  not  completely 
independent,  with  the  precision  of  residual  estimates  being 
lower  than  the  direct.  In  this  case,  such  designs  are  more 
useful  when  the  main  reason  for  isolating  residual  effects  is 
to  provide  an  unbiased  estimate  of  the  direct  effects.  By 
adding  an  additional  period  and  repeating  the  last  treatment 
of  the  series  given  to  each  subject,  a  more  balanced  design 
can  be  formed.  With  this  "extra-period  design,"  direct  and 
residual  effects  are  independent  of  one  another  and  have 
approximately  equal  precision. 

Examples  of  how  to  analyze  the  basic  Latin  square  change¬ 
over  design  can  be  found  in  the  statistical  literature  rather 
easily  (Cochran  and  Cox,  1957;  Federer,  1964) .  This  is  not 
the  case  for  the  extra-period  change-over  design.  Lucas  (1957) 
describes  the  analysis  of  the  extra-period  design  in  the 
Journal  of  Dairy  Sciences.  Patterson  and  Lucas  (1962)  provide 
a  description  in  a  Technical  Bulletin  published  by  the  North 
Carolina  Agricultural  Experiment  Station.  Cochran  and  Cox 
(1957)  provide  the  necessary  equations  but  gives  no  numerical 
example  for  this  analysis.  Because  these  references  may  be 
difficult  for  a  reader  to  obtain,  the  steps  in  the  analysis 
of  the  extra-period  change-over  design  are  described  here, 
along  with  a  numerical  example. 
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ANALYZING  THE  RESULTS  FROM  AN  EXTRA-PERIOD  CHANGE-OVER  DESIGN 

In  Table  15-A,  an  extra-period  change-over  design  is  shown 
for  four  conditions  (A  through  D) .  The  balanced  4x4  Latin 
square  arrangement  has  been  supplemented  with  an  additional 
period  (I  through  V)  to  orthogonalize  direct  treatment  and 
residual  treatment  (a  through  d)  effects.  Fictitious  perfor¬ 
mance  scores  for  each  of  the  20  conditions  are  given  in 
parentheses . 

Some  analyses  have  been  completed  and  the  results  are  given 
in  the  margins  of  the  design  in  Table  15-A: 

P.  =  sum  of  all  scores  made  in  each  period 
e.g.,  PIIt  »  8  +  3  +  4  +  2  -  17 

S.  =  sum  of  all  scores  made  by  each  subject 

1  e.g.,  S2=l+6+3+4+2=16 

£Y  =  sum  of  all  N  scores  =  £S^  =  £P^ 

2 

£Y  =  sum  of  all  scores  after  squaring  each  one 

N  =  total  number  of  conditions  =  ng  x  np 

n  =  number  of  treatments  =  number  of  subjects 

In  Table  15-B,  results  from  additional  preliminary  analyses 
are  given: 

T.  =  sum  of  all  scores  made  for  each  treatment 

1  (direct),  e.g.,  TD=5+3+8+4+2=22 

R.  =  sum  of  all  scores  made  on  the  period  imme- 
1  diately  following  each  treatment  i 

(residual),  e.g.,  R=6+4+4+5=19 

CL 

S .  —  P_ .  =  sum  of  all  scores  for  each  subject  except 

1  for  that  occuring  in  the  first  period, 

e.g.,  S2=6+3+4+2=15 

In  addition,  we  must  obtain: 

2 

R.  =  sum  of  the  residual  values  after  each  are 
1  squared 

(£Y-£Pj)  =  sum  of  all  scores  except  those  made 

during  the  first  period  =  £ (S^  -  P^) 

In  this  example,  since  t  is  even,  there  is  a  single  Latin 
square.  If  t  were  odd,  then  a  minimum  of  two  squares  would 
be  needed  to  create  the  desired  balance.  The  letter  q 
equals  the  number  of  Latin  squares;  in  our  example,  q  =  1. 
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TABLE  15.  EXTRA  PERIOD  CHANGE-OVER  DESIGN, 
DATA,  AND  PRELIMINARY  ANALYSES 


Part  A 


Periods 


Residual (R) 
Treatments (T) 
Performance (Y) 


SUBJECTS 

(S) 

1 

2 

3 

V 

P. 

J 

A(3) 

B(l) 

°  (5) 

/  °(4) 

13 

a®  (6) 

bC(6) 

dA<3) 

^c°(3) 

18 

b°  (8) 

cA(3) 

aC(4) 

dB(2) 

17 

dC  (7) 

a°(4) 

cB  (2) 

bA(4) 

17 

cC<2) 

d°(2) 

bB(l) 

aA(5) 

10 

26 

16 

15 

18 

EY= 

75 

Zy2= 

353 

N= 

20 

Part  B 


T.  : 

X 

18 

12 

23 

22 

a 

b 

c 

d 

R.  : 

X 

19 

19 

10 

14 

1 

2 

3 

4 

S,  -PT  i 

23 

15 

10 

14 

N  =  Total  number  of  observations 
n  =  t  =  number  of  conditions  (treatment) 
q  *  Number  of  Latin  squares  =  1 
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In  Table  16,  the  analysis  of  variance  is  completed.  Sums 
of  squares  and  degrees  of  freedom  and  variances  are  obtained 
for  the  following  sources  of  variance:  total,  periods,  subjects, 
direct  and  residual  treatment  effects,  and  error.  This 
differs  from  a  conventional  analysis  only  in  the  calcula¬ 

tions  for  the  direct  and  residual  treatment  effects,  where 
direct  effects  must  be  isolated  from  its  overlap  with  subjects 
and  the  residual  effects  must  be  isolated  from  an  overlap  with 
periods.  These  "overlaps,"  oi  correlations,  can  be  detected 
in  the  design  shown  in  Table  ISA,  e.q.,  subject  1  has  two 
treatment  C's  but  only  one  of  all  other  conditions,  subject  2 
has  two  treatment  D's  but  only  one  of  the  others,  and  so  forth, 
while  no  residual  appe.  s  in  period  1.  The  magnitude  of  the 
direct  and  residual  treatment  effects  are  those  obtained  after 
eliminating  the  correlated  portion. 

In  Table  17,  additional  equacions  (found  in  Lucas,  1957) 
are  supplied  for  the  analysis  when  t  is  odd  and  two  Latin 
squares  (q  =  2)  are  involved.  The  only  new  term  here  is: 

=  sum  of  all  values  in  each  Latin  square 

When  there  are  two,  then  the  sum  of  squares,  degrees  of  freedom 
and  variance  must  be  estimated  for  Latin  squares.  New  sum  of 
squares  for  periods,  subjects,  and  error  are  calculated  first 
by  the  equations  in  Table  16,  and  then  corrected  by  subtracting 
the  sum  of  squares  for  squares  from  each  other  source,  as 
shown  in  the  equations  in  Table  17.  These  new  or  modified 
values  are  then  included  in  an  analysis  of  variance  along 
with  the  values  of  the  direct  and  residual  treatment  effects 
which  are  calculated  using  the  same  equations  as  shown  in 
Table  16.  The  symbol  Ix2p^^  indicates  that  it  is  the  sum  of 

squares  for  period  within  Latin  squares;  any  others  that 
include  the  /Q  indicate  this  same  change  in  source  of  variance. 

LIMITATION  IN  THE  USE  OF  EXTRA-PERIOD  CHANGE-OVER  DESIGNS 

These  designs  assume  a  linear  model,  as  indicated  in  the 
tables.  Therefore,  they  would  not  be  adequate  if  it  is  sus¬ 
pected  that  there  is  an  interaction  between  direct  and  residual 
treatment  effects.  The  linear  model  is  applicable  only  if  the  • 
residual  effect  for  each  treatment  remains  essentially 
constant  regardless  of  the  treatment  that  follows.  The  DxR 
interaction  implies  that  the  magnitude  of  a  residual  effect 
will  vary  depending  on  which  treatment  follows  which  treatment. 

In  many  human  performance  studies,  particularly  those  involving 
motor  tasks,  one  cannot  assume  that  the  linear  model  is  valid. 

Simon  (1979a)  had  suggested  that  change-over  designs  might 
provide  a  new  and  economical  way  of  studying  transfer  of  train¬ 
ing.  The  few  designs  in  the  literature  that  are  based  on  a  DxR 
model  are  uneconomical,  requiring  too  many  observations  to 
properly  balance  only  a  relatively  few  treatments.  Efforts  to 
develop  more  economical  designs  for  this  interaction  model 
were  not  successful  (Simon,  1979b,  Section  III). 
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TABLE  16.  ANALYSIS  OF  VARIANCE  OT  EXTRA-PERIOD 
CHANGE-OVER  DESIGN  WHEN  t  IS  EVEN  AND 
q  IS  ONE  LATIN  SQUARE 


Ex2i- 

d.f. 

Var . 

TOTAL: 

EY1  - 

.imi  . 

N  T 

[N-l] 

353  - 

-  353  -  281.25  - 

Exi 

71.75 

19 

3.78 

PERIODS: 

EP12 

n 

SML  -  Ex2 

N  P 

q[n] 

1171 

4 

-  281.25  - 

z*l 

11.5 

4 

2.88 

SUBJECTS: 

ESI2 

-  <EY) 2  -  Ex2 

q[n-l] 

n+1 

N  EXS 

1481 

5 

-  281.25  - 

H 

14.95 

3 

4.98 

DIRECT: 

E[  (n+1)  (ET^  -  ESti  -  EY]2yqn(n+l)  (n+2)  - 

[n-l] 

Ta: 

[(5) (18)  -  18  -  75] 2  -[  -3] 2  -  9 

Tb: 

[(5)  (12)  -  15  -  75] 2  -1-30]2  -  900 

Tc : 

[(5) (23)  -  26  -  75] 2  *[  14] 2  -  196 

Td: 

1(5) (22)  -  16  -  75] 2  «[  19] 2  -  361 

E  -  1466 

4-5-6 

H 

12.22 

3 

4.07 

RESIDUAL: 

ER2  - 
_ 1 

(EY-EP)  2 

I  _  rv2 

[n-l] 

qn 

— — 2 -  “ 

qn  R 

1018 

4 

•  ~T—  -  254.5  -  240.2  - 

4 

Exr 

14.25 

3 

4.75 

ERROR: 

Ex2  - 
T 

ExJ  -Ex|  -  Ex2  -  ExJ  -  ExJ 

'"'"1 

{(qn  -2Kn-l)] 

71.75 

-  11.5  -  14.95  -  12.22  -  14.25  « 

Exe 

18.83 

6 

3.14 

MODEL: 

Ex2  - 
T 

EXq  *  +  Ex2  ♦  Ex|  ♦  Ex2 
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TABLE  17.  ADDITIONAL  EQUATIONS  FOR  ANALYSIS  OF  VARIANCE 
OF  EXTRA-PERIOD  CHANGE-OVER  DESIGN  WHEN  t  IS 
ODD  AND  q  IS  TWO  LATIN  SQUARES 


ADD  TO  ANALYSES  IN  TABLE  16 
LATIN  SQUARES  (Q) 

(Ey)2 


Ex';  =  E Q?  - 

Q  V1 


N 


MODIFY  ANALYSES  IN  TABLE  16 


PERIODS  WITHIN  SQUARES  (substitute  for  PERIODS) 


Ex* 

P/C 


Ex* 


Ex' 


p  Q 

SUBJECTS  WITHIN  SQUARES  (substitute  for  SUBJECTS) 

Ex2  -  Ex2  -  Ex2 
L S/Q  L  S  XQ 

ERROR  WITHIN  MULTIPLE  SQUARES  (substitute  for  ERROR) 


Ex2  -  Ex2 
1 E/Q  E 


Ex2 

L  Q 


d.f . 
[q-1] 


[qn] 


[q (n-1) ] 


[  (qn-2)  (n-1] 


MODEL 


Ex2  =  Ex2  +  Ex 
T  D  R 


2  +  Ex2  +  Ex2 


+  Ex2 
P/Q  S/Q 


+  Ex 


2 

E/Q 
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SECTION  XIV 

ANALYZING  SERIALLY-BALANCED  SEQUENCE  DESIGNS 


A  serially-balanced  sequence  design  is  a  type  of  change¬ 
over  design  used  to  isolate  direct  treatment  effects  from 
residual  effects  carried  over  from  a  preceding  treatment.  It 
differs  from  the  change-over  designs  in  the  preceding  section 
since  it  is  balanced  over  a  series  of  replicated  conditions 
run  by  a  single  subject  rather  than  among  a  group  of  subjects 
with  subjects,  trials,  and  treatments  arranged  in  a  Latin  square 
format.  Sampford  (1957)  describes  how  these  S.B.S.  designs  are 
constructed  and  analyzed.  Simon  (1974) ,  in  a  report  summar- 
i«  lg  techniques  for  handling  sequence  effects,  describes  the 
construction  of  these  designs,  but  not  the  analysis.  Since 
Sampford' s  explanation  of  the  analysis  may  be  difficult  for 
some  psychologists  to  follow,  the  method  of  analyzing  serially 
balanced  sequence  designs  (Type  1,  k  =  1)  is  given  here  along 
with  a  numerical  example.  The  Type  1  design,  like  the  extra¬ 
period  change-over  design,  balances  the  treatment  so  that 
direct  and  residual  treatment  effects  are  orthogonal.  However, 
residual  effects  remain  confounded  with  blocks.  Isolating 
block  effects  from  estimates  of  residual  effects  creates  the 
only  complication  in  the  analysis.  The  k  =  1  indicates  that  a 
single  sequence  is  used. 

METHODS  OF  ANALYSIS 

An  example  of  a  serially  balanced  sequence  design  is  shown 
in  Table  18  along  with  fictitious  performance  data.  Although 
the  conditions  are  arranged  in  a  Latin  square,  they  are  actual¬ 
ly  to  be  presented  to  a  single  subject  serially  beginning  with 
the  condition  in  the  upper  left-hand  corner,  moving  across  the 
row,  back  to  the  left  end  of  the  next  row,  and  so  forth  until  the 
lower  right-hand  condition  is  reached.  The  first  condition 
in  parentheses  is  called  a  "primer"  and  is  not  used  in  the 
calculations.  It  is  there  to  provide  a  residual  effect  for  the 
next  condition.  Thus,  performance  on  the  second  A  is  the 
result  of  the  combined  effects  of  the  direct  treatment  effect 
of  A  in  that  period,  plus  any  effect  carried  over  from  the 
primer  A  in  the  preceding  period,  plus  any  block  effect. 
Similarly,  the  performance  level  for  the  next  period  in  the 
sequence  is  the  result  of  the  direct  effect  of  condition  F, 
the  residual  effect  from  the  preceding  condition  A,  and  the 
effect  of  Block  1. 

The  symbols  and  equations  required  to  perform  the  analysis 
are  shown  below  and  should  be  calculated  in  the  order  given. 
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TABLE  18.  SERIALLY  BALANCED  SERIAL  DESIGN 
WITH  FICTITIOUS  DATA* 

(Type  1,  t=6,  k=l) 

PRIMER 


(A) 


Condition 

#1 


A(3)  F ( 8)  B (9)  E ( 8)  C  ( 9 )  D 


(10)  F ( 12)  C(ll)  E(10)  A(8)  B  (5) 


5 


B(7)  A(6)  C ( 7 )  F ( 12 )  D ( 13 )  E  (12 


C 


- < - 

E  ( 14 )  B ( 11 )  F ( 12 )  A(ll)  D ( 9 )  C(ll 


5 


C 


b 


C (11)  B(10)  D(ll)  A(10)  E (11)  F (16 


F  (18)  E  (17)  D  (15)  B  (12)  C(ll)y*(>10; 


t 


Fictitious 

data 


Condition 

#36 


Block  1 

Block  2 

Block  3 

Block  4 

Block  5 

Block  6 


*These  fictitious  data  were  created  by  weighting  the  Treatments,  Direct 
and  Residual,  and  Blocks  in  this  way. 

Let  Direct  Treatment  A  =  1  Add  1  if  it  follows  A  and  Add  1  if  in  Block  1 


B  =  2  " 

2 

II 

B 

It 

2 

II 

Block 

2 

C  =  3  " 

3 

II 

C 

II 

3 

•1 

Block 

3 

D  =  4  " 

4 

II 

D 

II 

4 

II 

Block 

4 

E  =  5  - 

5 

II 

E 

II 

5 

II 

Block 

5 

F  =  6  " 

6 

II 

F 

II 

6 

II 

Block 

6 

Thus,  Condition  1  =  [A=l] + [A=l] + [Block  1=1]  =  3 
Condition  16  =  [F=6] + [C=3] + [Block  3=3]  =  12 
etc. 

No  error  component  was  used. 
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1.  N 


2 

total  number  of  observations,  y.  =  t  ,  where  t  is  the 
number  of  treatments.  1 


2.  G  =  sum  of  all  scores  =  +  y2  +  ....  yn 

3.  M  =  grand  mean  =  G/N 


4. 


T. 

1 


sum  of  all  scores  for  each  treatment,  e.g., 
+  •  • 


TA1  +  TA2 


T-.  “  T. 
At  A 


5.  B.  =  sum  of  all  scores  for  each  block,  e.g., 

"  B^  +  B12  +  ....  B^t  =  B^  (ignoring  possible  residual 

effects  that  might  be  present) . 

6.  R.  =  sum  of  all  scores  for  each  residual,  e.g., 

1  Ra1  +  Ra2  +  ....  RAt  =  Ra  (ignoring  possible  block 

effects  that  might  be  present) . 


7.  Ix* 

8.  ExJ 


The  scores  for  the  residuals  of  Factor  i  are  those 
assigned  to  the  treatments  that  follow  the  Factor  i 
treatment.  For  example,  in  the  second  block,  the 
score  containing  the  residual  for  treatment  E  will 
be  (8) ,  and  for  treatment  A  will  be  (5) .  Note  that 
the  residual  of  the  primer  treatment  is  included  here, 
e.g.,  3  for  primer  treatment  A,  as  well  as  8  for  first 
treatment  A. 


total  sum  of  squares  =  £Y 


treatment  sum  of  squares  =  £T^ 


2  2 

9.  Ex_  =  block  sum  of  squares  =  EB.  2 

^  «  *ii  *  V  / 

ignoring  possible  — —  — — 

residual  effects. 

The  only  complicated  part  of  the  analysis  is  where  residual  effects 
are  adjusted  by  removing  any  overlap  with  block  effects.  The 
following  explains  the  steps  for  doing  this. 


10.  R! 

i 


residual  total  adjusted  for  block  effects  = 


tR. 

l 


G  +  B 


i-1 


B. 

1 


Important  note:  The  numerical  order  for  residuals  does  NOT 
correspond  to  their  alphabetical  order.  Instead,  each 
residual  (and  its  treatment)  receives  the  number  of 
the  block  in  which  it  appears  first.  Thus,  if  i  =  3 
in  our  example,  the  R^  that  goes  with  B^  will  be  Rg 

since  both  the  residual  and  the  direct  effect  of 
Treatment  B  is  the  first  one  in  Block  3.  Where  i  for 
R^  is  2,  R2  will  be  Rp. 
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11.  The  following  is  a  "cookbook"  description  of  the  steps 
required  to  estimate  the  reside!  effects  isolated  from 
block  effects  (Rn).  ?*'■':  _ne  numerical  example,  page  105. 

Con*-1 ructing  the  coefficient  matrix 

a.  Develop  a  coefficient  matrix  with  t  columns  and  t  rows 
where  t  equals  the  number  of  treatments.  Identify  the 
columns  as  follows: 


Write  the  letter  a  above  the  first  column.  Then 
write  subsequent  letters  in  alphabetical  order, 
first  forward  and  ther  reversed,  enough  to  cover 
the  remaining  columns  in  a  symmetrical  pattern. 

For  example,  when  t  =  6,  the  letters  for  the 
columns  would  be  a,  then  b,.c,  d,  c,  b;  when  t  =  7, 
the  letters  would  be  a,  then  b,  c,  d,  d,  c,  b. 

2 

b.  Write  the  numerical  value  equal  to  (t  -2)  along  the 
main  diagonal  of  the  matrix  (upper  left  to  lower 
right) . 


c.  Write  the  number  1  in  each  row  to  the  right  and  left 
of  the  diagonal.  Where  there  is  no  space  to  the  left 
(first  row)  or  right  (last  row) ,  the  number  1  is 
placed  at  the  opposite  end  of  that  row.  Put  zeros  in 
the  remaining  cells. 


a  b  c 

(t2-2)  1  0 

1  ( t 2  —  2 )  1 

0  1  (t2-2) 


deb 
0. . .  0  1 

0. . .  0  0 

1.  .  .  0  0 


0 

1 


0  0 
0  0 


0...  (t2-2)  1 

0...  1  (t2-2) 


Writing,  solving,  and  inverting  the  normal  equations 


d.  Write  the  normal  equations  as  the  sum  of  the  products 
between  each  row  coefficient  and  its  corresponding 
columnar  term  (letter) .  Only  [t/2  +  1] ,  rounded  down, 
of  the  t  equations  will  be  unique,  the  exact  number 
being  the  number  of  di f f erent  letters  (terms)  formed 
in  Step  11a. 

e.  Set  the  equation  in  the  first  row  equal  to  one.  Set 
all  of  the  remaining  rows  equal  to  zero. 
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f.  Solve  for  unknown  terms  using  the  usual  algebraic 
processes.  Start  with  the  bottom  equation  and  work 
up,  substituting  the  value  of  each  term  as  it  is 
determined.  Be  careful  to  maintain  the  correct  arith¬ 
metic  signs. 

g.  Use  the  values  obtained  for  each  unknown  to  write  the 
first  row  of  the  inverted  matrix  by  placing  each  value 
under  the  appropriate  term  in  the  column  designated  in 
Step  11a.*  Complete  the  remaining  rows  of  the  inverted 
matrix  by  horizontally  rotating  the  first  row  to  the 
right,  one  column  at  a  time  for  each  succeeding  row. 

h.  To  solve  for  the  residual  effects  with  block  effects 
removed,  R" ,  we  multiply  the  vector  of  Rj^  values  by 
the  inverse  coefficient  matrix.  To  do  tnis,  the  ele¬ 
ments  of  the  first  row  of  the  coefficient  matrix  are 
multiplied  by  the  corresponding  elements  of  the  column 
of  R|  values  and  summed.  This  sum  is  divided  by  the 
common  denominator,  the  result  will  be  the  first  element 
of  the  R£  column,  i.e.,  RV  =  A^Rj  +  A-^R^  +...  + 

AlnR^.  Similarly,  multiplication  of  elements  from  the 

second  row  of  the  coefficient  matrix  with  the  corres¬ 
ponding  elements  of  R|  values  will  give  the  second 

element  in  the  RV  vector  (R£)  .  To  simplify  this,  it 

helps  to  write  the  t  RV  values  in  order  above  the 
columns  of  the  coefficient  matrix.  Then  RV  will  be  the 
sum  of  the  cross  product  of  the  R^  value  and  the  corres¬ 
ponding  coefficient  in  each  row,  where  the  value  of  i 
in  RV  corresponds  to  row  i  in  the  matrix. 

12.  BV  =  block  effects  with  residual  effects  removed  = 

(tBi  -  G  -  tRV  +  tRV+1)/t2  (where  R£+1  =  R£) 

2  2 

13.  (ExR  +  Exrh)  =  composite  sum  of  squares  for  block  and  for 

residual  eliminating  blocks  =  ER^RV  +  EB^BV 

2 

14.  ExrM  =  residual  sum  of  squares  eliminating  blocks  = 

( EXg  +  Ex£„)-Ex2  =  (Step  13)  -  (Step  9) 


*Sampford  (1957,  p  302)  provides  solutions  for  the  first 
row  of  inverse  matrices  for  t  =  6  to  10  inclusive.  These  are 
given  in  Table  19. 
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TABLE  19.  FIRST  ROW  VALUES 

OF  INVERSE  MATRIX 

=  6 

7 

8 

9 

10 

19601 

105937 

7380481 

39424240 

4517251249 

-577 

-2255 

-119071 

-499121 

-46099201 

17 

48 

1921 

6319 

470449 

-1 

-1 

-31 

-80 

-4801 

17 

-1 

1 

1 

49 

-577 

48 

-31 

1 

-1 

[665280] * 

-2255 

1921 

-80 

49 

Denominators] 

[4974529]* 

-119071 

[457351680]* 

6319 

-499121 

[3113516718] 

(From 

-4801 

470449 

*  -46099201 

[442598424000] 

Sampford,  1957, 

composite  total  sum  of  squares  less  treatment  sum  of 
squares,  residual  sum  of  squares  (eliminating  blocks) , 
and  block  sum  of  squares  (ignoring  residuals)  = 

£*g  - 14  -  - 14  - 

(Step  7)  -  [(Step  8)  +  (Step  13)) 


16.  Sources,  sums  of  squares,  and  degrees  of  freedom. 


Source 

Ex2 

d.f . 

Total 

r  2 

ZxG 

t2-l 

Treatments : 
Direct 

V  2 

ZXT 

t-1 

Residual 

Ex2,, 

t-1 

Blocks 

e*b 

t-1 

Error 

ZXE 

t2-3t+2 
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NUMERICAL  EXAMPLE* 


This  numerical  example  follows  the  steps  set  forth  in  the 
"Method  of  Analysis"  section.  The  full  design  for  six  treatments 
and  the  data  on  which  this  analysis  is  based  are  given  in 
Table  18. 


1.  N  =  t2  t  =  6,  N  =  36 

2.  G  3+8+9+8+9+10+12. . .+11+10  =  378 


3. 

4. 


5. 


M 

T. 


B] 

B. 

4 

B. 


378/36  =10.5 

3+8+6+11+10+10  =  48 
9+5+7+11+10+12  =  54 
60 ,  Td  =  66,  Te  =  72,  Tp  =  78 

3+8+9+8+9+8  =  45 

10+12+11+10+8+5  =  56 

57,  B4  =  68,  B5  =  69,  Bg  =  83 


6.  R.  =  3+8+5+7+9+11  =  43 

A 

R  =  8+7+6+12+11+11  =  55 

B 

R„  =  61,  R  =  67,  R  =  73,  R_  =  79 
C  '  D  '  E  *  F 


7.  Ex*  = 

8.  Ex*  = 

T 

9.  Ex*  = 


[3*+8*+9*  + _ 15 *+12 *+11 *+10*  ]  -  (337g8)2-  =  345 

(48)*  +(54)*  +(60)*  +  (66)*  +(72)*  _  (378)*  _  1Q5 

6 

(45)*  +(56)*  +(57)2  +(68)*  +(69)* +(83)*  (378)*  _  ,  „  c 

- 6 - J6 - 145 


10. 


B 


1-1 


R1 '  RB  -  V  Rc  ~  R5  '  RD  R2  '  RE  ~  R4 '  RF  R6 

B0  =  V  B2-l  =  V  6tC- 


RI 

=  6(43) 

-  378 

+  83 

-  45 

=  -82 

R2 

=  Rd 

=  6(67) 

-  378 

+  45 

-  56 

=  +13 

Rl 

=  R' 

=  -49 

R'  = 

R'  = 

+49 

R'  = 

3 

B 

4 

E 

5 

*F 


=  +82 


♦This  example  was  prepared  by  Dr.  Howard  B.  Lee, 
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11a, b, c. 


abode 
34  1  0  0  0 

1  34  1  0  0 

0  1  34  1  0 

0  0  1  34  1 

0  0  0  1  34 

1  0  0  0  1 


lid. 


11< 


t - 

34a  +  2b 

a  +  34b  +  c 

b  +  34c  + 


-  1 
-  0 
d  •  0 


(Repeats) 


2c  +  34d 


Ilf. 

1  2c  =  -34d ;  c  =  -17d 

2  b  =  -34 (-17d)  -  d  =  +  577d 

3  a  =  -34 (+577d)  -  (17d)  =  -19601d 

4  34(-19601d)  +  2 (+577d)  «  -665280d  =  +1 


d  =  6652d0~’  Substitute  the  numerator  for 
"  '  d  in  the  other  equations. 

Since  the  denominator  is 
constant  to  all  terms,  it 
will  be  held  out  until  later 


llg . 

-82 

+13 

-49 

+49 

-13 

+  87 

<- 

R! 

l 

a 

b 

c 

d 

c 

b 

r Tz 

19601 

-577 

17 

-1 

17 

-577 

-82 

a 

Ri 

-577 

19601 

-577 

17 

-1 

17 

+  13 

b 

R2 

1 

17 

-577 

19601 

-577 

17 

-1 

-49 

c 

R3 

665280 

-1 

17 

-577 

19601 

-577 

17 

• 

+  49 

d 

R4 

17 

-1 

17 

-577 

19601 

-577 

-13 

c 

R5 

-577 


17 


-1 


17  -577  19601 
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llh. 


R" 

R1(A) 


(19601)  ( -82)  » (-577)  (13)+(17)  (-49)+(-l)  (+49)+(17)  (-13)+(-577)  (+82) 

665280 


.  *  (-577) (-82) + (19601) (13)+(-577) (-49)»(  17) (49)+(-l) (-13)»(17) (82) 

665280 


*5(b) 


-1.5 


R" 

R4(e) 


1.5 


"5(0 


-.5 


D« 

R6(f) 


2.5 


12.  B”  =  [6(45)  -  378  -  6  (-2.5)  +  6  ( -  5)  1/36  =  -2.5 

B"  =  [6(56)  -  378  -  6(.5)  +  6(-1.5)]/36  =  -1.5 
B!J  =  -.5  B]j  =  .5  B[J  =  1.5  Bj!  =  2.5 


13. 


14. 


UXg„  +  lx2*] 


[(43)  (-2.5)  + 
(67) (.5) 


(55) (-1.5)  +  (61) (-.5)  + 
+  (73) (1.5)  +  (79) (2.5) ] 


[(45) (-2.5)  + 
(68)  (.5) 


(56) (-1.5)  +  (57) (-.5)  + 
+  (69)  (1.5)  +  (83)  (2.5)) 


Ex2,,  -  240  -  145  *  95 


240 


15.  Ex*  *  345  -  [105  +  240]  *  0 
£ 


[Note:  While  this  is  so  for 
this  fictitious  example,  the 
error  would  ordinarily  not  be 
equal  to  zero.) 


16.  ANOVA  Table 


Source 

Ex2 

Degrees  of  Freedom 

Variance 

105 

5 

21 

95 

5 

18 

ExB" 

145 

5 

29 

Ixi 

0 

20 

0 

•No  test, was  possible  in  this  fictitious  example  since  there  was  no  error 
variance. 
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SECTION  XV 

DESIGN  ECONOMY  WHEN  EXPERIMENTAL  FACTORS 
SELECTIVELY  AFFECT  BI-VARIATE  CRITERIA 


There  can  be  times  when  some  experimental  factors  will  be 
expected  to  affect  one  set  of  criterion  measures  and  other 
experimental  factors  will  be  expected  to  affect  a  different 
set  of  criterion  measures.  In  multifactor,  multiple  criteria 
experiments,  if  such  pairings  do  occur  between  independent 
and  dependent  variables  with  infrequent  overlaps,  Daniel  (1960) 
has  shown  how  additional  economy  can  be  achieved  when  2k  or 
2^~p  data  collection  patterns  are  employed. 

Let  us  suppose  that  in  the  AWAVS  investigation  of  par¬ 
ameters  for  a  pilot-training  simulation  of  carrier  landings, 
the  investigator  has  good  reasons  to  believe  that  the  Flols 
display  will  have  a  significant  effect  on  vertical  deviations 
from  the  glideslope  but  not  the  horizontal,  that  wind  gusts 
across  the  flight  path  will  significantly  affect  horizontal 
deviations  but  not  the  vertical,  and  that  lag  between  responses 
of  the  visual  and  motion  systems  will  affect  both  measures.  In 
an  investigation  to  quantify  these  effects,  the  pattern  of 
critical  effects  from  the  different  sources  of  variance  in  a 
2 3  factorial  design  would  look  like  this: 


Source 


Criteria 

Vertical  (y^)  Horizontal  (y^) _ 


FLOLS 

A 

X 

LAG 

B 

X 

X 

AB 

X 

WIND 

C 

X 

AC* 

BC 

ABC* 

X 

i  placed 

in  the 

criterion  column 

affected  by 

the  left.  The  sources  with  asterisks  affect  neither  criterion. 


It  is  apparent  that  the  same  information  would  be  obtained 
were  two  2 2  experiments  run,  one  to  study  the  effects  of  A  and  B 
on  the  vertical  measure,  the  other  to  study  the  effects  of  B  and 
C  on  the  horizontal  measure.  In  each  case,  two  main  effects 
and  one  two-factor  interaction  could  be  estimated  correctly. 

If  this  were  done,  however,  no  economy  would  have  been  achieved 
by  running  the  two  four-condition  experiments  rather  than  a 
single  eight-condition  one. 
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If  we  arrange  the  sources  and  effects  into  groups  of 
influence  in  this  manner: 


1 2 
BC 

B  AC 

C 

ABC 

those  familiar  with  aliasing  in  fractional  factorials  will 
note  that  the  effects  are  combined  as  they  would  be  in  a  2^"^ 
fractional  factorial  design: 

(  A  +  BC) 

(B  +  AC) 

(AB  +  C  ) 

each  aliased  pair  being  associated  with  one  degree  of  freedom. 

In  this  fraction,  the  defining  generator  is: 

I  =  ABC 

As  such,  effect  ABC  cannot  be  estimated.  However,  since  neither 
it  nor  effect  AB  is  believed  to  have  a  critical  effect  on  yl  or 
y2,  no  information  will  be  lost.  Furthermore,  since  effects  C, 
AC,  and  BC  are  negligible  on  criterion  yl,  the  aliasing  will 
not  bias  the  estimates  of  A,  B,  and  AB.  Similarly,  since  A,  AB, 
and  AC  are  believed  to  have  negligible  effects  on  y2,  estimates 
of  B,  C,  and  BC  will  not  be  biased  by  being  aliased  with  them, 
insofar  as  the  second  criterion  is  concerned.  By  aliasing*  those 
effects  that  do  not  affect  the  same  performance  criterion,  we 
are  able  to  cut  the  size  of  the  experiment  in  half.  Only  the 
following  four  (out  of  eight)  experimental  conditions  are  needed 
to  complete  this  half-replicate  design: 

a ,  b ,  c i  abc 

The  other  half-fraction,  I  =  -ABC,  might  have  been  used.  This 
would  involve  the  experimental  conditions:  (1),  ab,  ac,  be. 

Daniel  (1960,  pp  266-267)  supplies  the  defining  generators 
for  a  number  of  eight-  and  sixteen-condition  designs  involving 
from  four  to  eight  factors  with  different  "influence  patterns," 
i.e.,  1-2-1,  2-1-2,  2-2-2,  3-0-3,  3-1-3,  and  4-0-4.  An 
influence  pattern  is  a  notation  Daniel  uses  to  describe  the 
independent-dependent  factor  pairings  when  only  two  criteria 
are  involved.  The  influence  pattern  for  the  example  described  in 
this  section  would  be:  1-1-1,  corresponding  to  the  letters 
A-B-C  respectively.  The  first  term  indicates  the  number  of 
factors  that  affect  only  yl;  the  last  term,  the  number  affect¬ 
ing  only  y2;  and  the  middle  term,  the  number  affecting  both 
criteria. 
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Since  one  cannot  always  be  certain  that  the  particular  in¬ 
fluences  will  occur  as  assumed,  designs  should  be  selected  so 
that  the  incorrectness  of  the  assumptions  might  be  detected.  This 
would  seem  to  set  the  requirement  that  at  least  Resolution  XV 
designs  should  be  used  so  that  no  main  effects  will  be  confounded 
with  any  other  main  effects  nor  with  two-factor  interactions, 
which  would  remain  in  strings.  If  the  design  is  not  saturated, 
the  characteristics  of  the  three- factor  interaction  strings  may 
provide  clues  to  the  correctness  of  the  assumptions  and  may  also 
serve  as  an  estimate  of  error  to  the  extent  that  they  are  negligible. 

It  may  seem  at  this  point  that  we  have  made  a  complete  circle, 
beginning  with  a  new  technique  for  effecting  economy  but  ending  up 
with  the  same  size  design  that  one  would  have  used  anyway  in  a 
conventional  screening  design  with  one  or  more  criteria.  This  is 
not  quite  the  case.  Instead,  the  technique  provides  a  useful  and 
different  way  of  looking  at  a  problem  in  experimental  design  and 
can  be  most  effective  and  economical  when  the  influence  pattern 
is  well  defined.  It  provides  an  additional  basis  for  deciding 
how  to  assign  factors  to  the  experimental  design  structure,  which 
ones  to  alias  and  which  to  isolate.  Furthermore,  with  clues  from 
initial  blocks  of  data  available  to  verify  initial  assumptions  of 
influence  patterns,  the  size  of  the  effort  required  to  isolate 
two-factor  interactions  in  strings  for  purposes  of  screening  or 
developing  response  surfaces  will  be  reduced  since  certain  combin¬ 
ations  will  not  be  expected  to  influence  either  criterion. 

Daniel  (1960)  proposes  a  pre-experiment  analysis  of  the  vari¬ 
ables  —  an  operation  that  is  already  a  part  of  Phase  I  of  the 
"new  paradigm"  (Simon,  1977b)  when  economical  multifactor  designs 
are  used  with  single  criterion  —  "to  summarize  the  experimenter's 
knowledge  and  feelings  about  the  effects  of  each  of  K  factors  on 
each  of  r  Responses."  He  writes  (p.  268): 

A  K  x  R  "influence  matrix"  has  been  useful 
both  in  aiding  the  statistician  to  under¬ 
stand  the  limitations  and  advantages  of  the 
experimenter's  technical  background,  and  to 
record  the  experimenter's  state  of  belief 
before  the  new  round  of  experiments  is 
started.  Entries  of  -1,  0,  +1  can  be  used 
to  indicate  the  experimenter's  opinions 
about  the  sign  and  magnitude  of  real  effects. 

An  i  can  be  used  to  indicate  ignorance. 


Daniel  suggests  that  it  might  be  helpful  if  criteria  could  be 
classified  according  to  types,  for  example:  1)  those  that 
measure  similar  properties,  in  the  same  units,  e.g.,  vertical 
deviations  from  glideslope  before  and  after  training;  2)  those 
that  measure  similar  or  related  properties,  not  of  the  same 
dimensions,  e.g.,  vertical  and  horizontal  deviations  from  the 
flightpath;  3)  those  that  are  based  on  qualitatively  different 


properties,  e.g.,  cost  and  performance  on  particular  simula¬ 
tion  configuration. 
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